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ABSTRACT 


The  concept  of  the  wiretap  channel  was  first  proposed  by  Wyner.  He 
considered  the  case  in  which  data  is  to  be  transmitted  reliably  over  a 
discrete  memoryless  main  channel  to  a legitimate  receiver.  The  wiretapper 
views  the  output  of  the  main  channel  through  another  discrete  memoryless 
channel.  It  is  assumed  that  the  wiretapper  knows  the  code  being  used  and 
his  only  handicap  is  the  additional  noise  in  his  signal.  The  problem  is 
to  maximize  the  transmission  rate  R to  the  legitimate  receiver  and  the 
equivocation  d of  the  wiretapper. 

In  this  dissertation,  the  additive  white  Gaussian  noise  wiretap 
channel  is  introduced  and  the  set  of  all  achievable  (R,d)  pairs  is 
determined  explicitly  through  the  use  of  certain  special  properties  of 
the  Gaussian  channel.  Some  useful  characterizations  of  a special  class  of 
wiretap  channels  are  also  explored. 

A model  of  the  wiretap  channel  with  feedback  is  proposed.  It  turns 
out  that  with  the  introduction  of  feedback,  even  when  the  main  channel  is 
inferior  to  the  wiretapper's  channel,  it  is  still  possible  to  reliably 
communicate  with  the  legitimate  receiver  in  complete  secrecy.  The  binary 
erasure  wiretap  channel  with  feedback  is  examined  and  inner  and  outer 
bounds  on  the  achievable  (R,d)  region  are  given. 

Finally,  a scheme  for  enlarging  the  capacity  region  of  multiple- 
access  channels  using  feedback  is  analyzed.  It  is  shown  that  conditions 
under  which  an  enlargement  is  possible  are  fairly  weak,  indicating  that 
feedback  can  almost  always  increase  the  capacity  region. 
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CHAPTER  1 


INTRODUCTION 

1 . 1 GENERAL  I INTRODUCTION 

Modern  information  theory  is  based  on  the  work  done  by  Shannon  |1| 
in  the  late  1940's.  The  communication  system  which  he  analyzed  is 
shown  in  figure  1.1.  The  information  source  generates  random  messages 
which  are  encoded  into  channel  input  signals.  These  signals  are  then 
sent  over  the  communication  channel  to  the  destination.  In  the  process 
of  being  transmitted,  the  signals  are  subjected  to  random  disturbances, 
or  noise.  On  the  basis  of  the  received  signal,  the  decoder  makes  an 
estimate  of  the  message  output  by  the  source. 


ESTIMATE 

OF 

message  message 


Figure  1.1  - BASIC  POINT  TO  POINT  COMMUNICATION  SYSTEM 


Shannon's  fundamental  result  is  that  if  the  rate  of  the  source 
is  less  than  a quantity  C,  referred  to  as  the  capacity  of  the  channel, 
then  reliable  communication  is  possible.  That  is,  through  appropriate 
coding,  the  decoder  can  estimate  source  messages  with  arbitrarily  small 
probability  of  error.  On  the  other  hand,  if  the  source  rate  exceeds 
capacity,  this  is  not  possible.  This  result,  known  as  the  noisy  chan- 
nel coding  theorem,  was  very  surprising  as  it  had  previously  been  be- 
lieved that  there  was  a gradual  trade-off  between  rate  and  probability 
of  error. 

Since  the  pioneering  work  of  Shannon,  many  other  significant  con- 
tributions have  been  made  to  the  theory  as  testified  to  by  the  vast 
literature  [2,  3|.  During  the  last  few  years,  a great  deal  of  research 
effort  has  been  devoted  to  the  study  of  multi-user  communication  systems, 
that  is,  systems  involving  more  than  a single  source  and  a single  re- 
ceiver. In  this  dissertation,  we  will  analyze  some  such  systems. 

In  this  introductory  chapter,  some  previously  studied  multi- 
user networks  are  briefly  reviewed.  In  the  second  chapter,  the  concept 
of  the  wiretap  channel  is  presented  and  the  complete  set  of  rate- 
equivocation  pairs  for  the  additive  white  Gaussian  noise  wiretap  chan- 
nel is  found.  Some  useful  characterizations  of  a special  class  of 
wiretap  channels  are  also  examined.  In  chapter  3,  a model  for  the 
wiretap  channel  with  feedback  is  introduced.  An  innerbound  and  an 
outerbound  to  the  achievable  region  are  derived.  A scheme  for  en- 
larging the  capacity  region  of  multiple-access  channels  using  feed- 
back is  proposed  and  analyzed  in  chapter  4. 
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1 . 2 PRELIMINARIES  AND  DEFINITIONS 

Suppose  { U -j  > ^ is  a sequence  of  L letters  from  a discrete 
stationary  source.  Then  the  rate  or  entropy  of  the  source  measured 
in  bits  per  source  letter  is  defined  as 

H(U)  = lim  1 H(Ui,  U2.  • ••,  UL).  (1.2.1) 

[_-»  oo 

Let  us  assume  that  an  encoder  takes  blocks  of  L source  letters  at 
a time  and  maps  them  into  channel  codewords  of  block  length  N.  The 
communication  rate  R in  bits  per  channel  use  is  given  by 

H ( U j , U2,  ...,  UL) 

R = • (1.2.2) 

N 

In  the  single  source,  single  receiver  communication  system  depicted 
in  figure  1.1,  a single  number  C is  sufficient  to  describe  the  net- 
work's capacity  for  reliable  communication.  However,  in  multi-user 
networks  we  are  interested  in  several  simultaneous  transmission  rates 
R[ , R , ...,  where  Ri , 1 i M refers  to  the  rate  over  the  ith 
communication  link.  These  rates  are  often  considered  as  a rate  vector 
R and  lead  to  a generalization  of  the  concept  of  capacity  to  that  of 
a capacity  boundary  surface  in  Euclidean  M-space. 

Definition  1.1:  A rate  vector  R = (RM  R2,  ...,  R^)  is  said  to  be 

achievable  by  a communication  network  if  reliable  transmissions  are 
simultaneously  possible  over  the  network  at  rates  R - £ for  all 
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£ = (tj , l2,  ...  , >.  M)  where  >0,1  £ i _ M. 

Definition  1.2:  The  capacity  region  f of  a communication  network 

is  the  set  of  all  achievable  rate  vectors  R. 

It  follows  from  the  definitions  that  f is  a closed  subset  of 
the  positive  orthant  of  Euclidean  M-space. 

A time-sharing  argument  shows  that  C is  also  convex:  If  R,  and 
R^  are  two  achievable  rate  vectors,  then  R = x Rj  + (1-x)  R , 

0 > _ 1,  is  also  achievable  by  the  scheme  which  uses  codes  designed 

for  R,  and  R,  for  fractions  a and  (l-'O  of  the  time. 

The  next  two  sections  will  be  a brief  review  of  two  extensively  studied 
multi-user  communication  networks,  namely  the  broadcast  channel  and  the 
multiple-access  channel.  In  both  cases,  the  channels  perturbed  by 
additive  white  Gaussian  noise  (AWGN)  are  used  as  examples.  Schemes 
for  achieving  rates  on  the  boundary  of  the  corresponding  capacity  regions 
are  outlined  as  they  provide  some  insight  and  perspective  into  the  prob- 
lems to  be  considered  in  subsequent  chapters. 


1.3  BROADCAST  CHANNEL 


In  an  innovative  paper  in  1972,  Cover  1 4 J analyzed  the  broadcast 
channel  in  which  one  transmitter  wishes  to  send  information  to  several 
receivers  simultaneously.  For  simplicity,  the  two-receiver  broadcast 
channel  is  shown  in  figure  1.2.  We  shall  assume  that  there  is  no  com- 
mon information  to  be  transmitted  to  the  two  receivers  (i.e.,  the 
source  messages  W)  and  W are  independent).  The  encoder  maps  Wi  and 
11  into  the  channel  input  signal  X.  The  memoryless  broadcast  channel 
is  specified  by  a set  of  conditional  probability  distributions  Pv  v v 

Y ! r ^ , A 

on  the  outputs  V and  Y2  given  the  input  X.  Because  the  receivers 
are  not  allowed  to  collaborate,  possible  dependence  between  Y]  and  Y 
conditioned  on  X is  irrelevant.  Thus  only  a knowledge  of  the  mar- 
ginal distributions  Py  and  Py  is  required.  On  the  basis  of 
his  channel  output  Yj  , the  first  receiver  makes  an  estimate  W5  of 
W . Similarly,  the  second  receiver  estimates  W from  Y . 

One  of  the  main  results  to  come  out  of  Cover's  work  is  that  if 
separate  messages  are  to  be  sent  to  several  receivers  from  a common 
transmitter,  it  is  often  possible  to  do  better  than  just  time-sharing 
the  transmitter  between  the  receivers  to  achieve  R = (a  Cj,  (1-a)  C ) 
where  C^,  i =1 ,2  denotes  the  capacity  from  the  transmitter  to  the  ith 
receiver.  Roughly  speaking,  this  improvement  is  achieved  by  super- 
imposing high-rate  information  on  low-rate  information.  Although  the 
capacity  region  of  general  memoryless  broadcast  channels  is  not  known, 
some  special  classes  of  channels  for  which  it  has  been  determined  are 
given  in  |4|.  Capacity  regions  typifying  those  of  the  incompatible 
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ENCODER  CHANNEL  DECODERS 


gure  1.2  - BROADCAST  CHANNEL 


channel  and  the  orthogonal  channel  are  shown  in  figures  1.3  and  1.4. 
These  two  channels  represent  the  smallest  and  the  largest  capacity 
regions  that  a broadcast  channel  can  have.  In  an  incompatible  channel, 
communication  with  either  receiver  results  in  the  transmission  of  pure 
noise  to  the  other  receiver;  one  can  do  no  better  than  time-sharing. 

In  an  orthogonal  channel,  efficient  communication  with  either  receiver 
in  no  way  interferes  with  communication  to  the  other:  each  channel 

performs  as  well  in  the  presence  of  the  other  as  it  would  alone. 

An  important  concept  is  that  of  the  degraded  broadcast  channel  in 
which  the  channel  to  the  second  receiver  is  statistically  equivalent 
to  the  cascade  of  the  channel  to  the  first  receiver  and  some  other  chan- 
nel. Bergmans  [5]  established  an  achievable  rate  region  for  degraded 
broadcast  channels  through  the  use  of  a satellite  coding  scheme  in  which 
closely  spaced  codewords  have  distinguishable  meanings  only  for  the  bet- 
ter receiver.  The  corresponding  converse  theorems  for  the  binary 
symmetric  and  Gaussian  broadcast  channels  were  proved  by  Wyner  |6] 
and  Bergmans  [9]  respectively.  The  converse  for  general  degraded 
channels  is  due  to  Gallager  ( 7 ] . 

A degraded  broadcast  channel  for  which  Cover  gave  an  achievable 
rate  region  is  the  additive  white  Gaussian  noise  (AWGN)  broadcast  chan- 
nel shown  in  figure  1.5.  The  channel  is  described  by 


Y)  = X + n j 

Y2  = X + n? 


(1.3.1) 
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where  n , and  n2  are  Gaussian  noises  with  variances  a\2  and  o22 
respectively  and  a,-  <a22.  There  is  an  average  power  constraint 
on  the  input  signal,  given  by 


N 

E Xt2  1 NP  (1.3.2) 

where  N is  the  block  length.  It  is  well  known  |1,  3]  that  the 
individual  capacities  are 


Ci  « i log  (1  + £2  ) 

t.  Oj 

C?  = \ log  (1  + ^2) 


(1.3.3) 


bits  Der  transmission,  where  all  logarithms  are  to  the  base  2. 


Tl 
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Figure  1,5  - GAUSSIAN  BROADCAST  CHANNEL 


Cover  [4 | suggested  the  following  scheme  for  improving  upon  the 
time-sharing  performance.  The  idea  is  to  devote  a fraction  1 = (l->) 
of  the  allowed  power  P to  the  transmission  of  the  message  intended 
for  the  second  receiver.  The  remaining  power  tP  is  used  to  communicate 
with  the  first  receiver.  Since  < o:  , any  message  which  can  be 
reliably  decoded  by  receiver  2 can  also  be  reliably  decoded  by 
receiver  1.  By  first  decoding  the  signal  intended  for  receiver  2 
and  subtracting  it  from  his  received  signal,  the  first  receiver  can 
convert  his  channel  to  a Gaussian  channel  with  input  power  constraint 
tP  and  additive  Gaussian  noise  of  variance  Oj2. 

Using  the  above  superposition  scheme,  independent  transmissions  to 
receivers  1 and  2 can  be  simultaneously  achieved  at  rates 


Ri 

R2 


log  (1  + — — 2-  ) 


"xP 


log  (1  * 


(1.3.4) 


Gaussian  broadcast  channels  are  more  extensively  discussed  in 
[8J,  where  it  is  shown  that  the  time-sharing  rate  region  is  dominated 
by  suitable  frequency  multiplexing.  The  latter  is  in  turn  dominated 
by  the  rate  pair  of  (1.3.4)  whose  optimality  has  been  established  by 
Bergmans  [9].  The  capacity  region  for  the  Gaussian  broadcast  channel 
is  depicted  in  figure  1.6. 
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Cover  [10]  and  van  der  Meulen  [11]  have  found  an  achievable  rate 
region  for  general  discrete  memoryless  broadcast  channels.  However, 
no  way  is  known  for  proving  a converse.  Recently  it  has  been  shown 
by  Gelfand  [12]  that  the  achievable  rate  region  given  in  [10]  and  [11] 
is  not  the  capacity  region. 


1.4  MULTIPLE-ACCESS  CHANNEL 


In  this  section,  we  review  some  results  concerning  the  multiple- 
access  channel  which  will  be  useful  in  Chapter  4.  Figure  1.7  shows  the 
inul i tple-access  channel  in  which  several  transmitters  communicate  with 
a single  receiver.  The  channel  is  characterized  by  its  input  alphabets 
3,3,  ....  3 its  output  alphabet  S , and  a set  of  conditional 
probability  measures  on  the  output  signal  Y given  the  input  signals 
X i , X.,  ...,  X^.  For  simplicity,  we  shall  deal  with  only  two  senders 
and  assume  that  the  channel  is  memoryless,  i.e. 

N 

p (y ! xl  , x2)  = n P (yi  |xn-,  x2i)  (1.4.1) 


where  y 

= (Yi > • • 

= (xn,  . 

X1N^ 

*2 

( X21 ’ • 

X2N^ 

The  sources  will  be  considered  to  be  independent.  (See  |13]  for  an 
investigation  of  the  savings  that  can  be  achieved  when  the  sources  are 
dependent.)  Recently  Ahlswede  (14],  Liao  [15],  and  Slepian  and  Wolf 
[13]  have  determined  the  capacity  region  t of  a two-input,  single  out- 
put, discrete  memoryless  channel: 

<? = closure  of  the  convex  hull  of  the  union,  over  all  input  distributions 
Py  Y ( • , . ) with  independent  X.,  X , of  the  sets  of  rate  pairs 

A]  , A 2 

R = ( R i , R? ) satisfying 
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0 . R 


0 . R + R 


I(X  ; Y X,) 


I (x  , X ; Y). 


(1.4.2) 


The  additive  white  Gaussian  noise  (AWGN)  multiple-access  channel 
is  the  most  commonly  encountered  continuous  alphabet  channel  ( t ^ = 

5 2 - ^ ) • The  output  signal  Y = X^  + X^,  + V where  X^  and  X?  are 

the  input  signals  and  V is  zero-mean  Gaussian  noise  independent  of 
Xj  and  X2  with  variance  EV  = ••'.  There  are  average  power  constraints 
P and  on  the  inputs  which  require  that  the  encoded  messages  Xj  and 
X2  (which  have  N components)  satisfy 


E . . Mil  ] _ N P,-,  i * J,  2 

J =1  J 


(1.4.3) 


The  capacity  region  <z  for  the  AWGfl  multiple-access  channel  has 
been  determined  by  Wyner  ( 1 6 1 and  Cover  |17|:  C = set  of  all  rate 
pairs  R = (R, , R ) satisfying 


R . I log  (1  + 4)  ' C, 


R w log  (1  + — ) A C 

O' 

i Pi  + P 

R + R>  ; p-  log  (1  + ) A C 


(1.4.4  a) 


(1.4.4  b) 


(1.4.4  c) 


The  region  (?  corresponds  to  the  shaded  region  shown  in  figure  1.8. 


t 


Figure  1.8  - CAPACITY  REGION  FOR  GAUSSIAN  MULTIPLE-ACCESS  CHANNEL 


1 P2 

Point  A:  (Ci,  j log  (1  + ) 

1 pi 

Point  B:  log  (1  + y—^2) , C2) 

The  proof  that  points  outside  C are  not  achievable  goes  as  follows: 

The  limits  on  R,  and  R2  given  by  (1.4.4  a)  and  (1.4. 4 b)  hold  because 
they  are  the  values  of  the  channel  capacities  for  the  individual  links. 
To  verify  the  bound  on  + R2,  we  first  note  that 


1.5) 


.6 
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From  the  independence  of  X,,  X and  V,  we  can  upperbound  the  variance 
of  Y by 

var  (Y)  = var  (X|)  + var  (X  ) + var  V (1.4.8) 

P + P + (1.4.9) 

From  j3.  Theorem  7.4.1  j 

H(Y)  <_  ^ 1°g  2 e (Pi  + P2  + ) (1.4.10) 

Substitution  of  H ( Y ) in  (1.4.7)  yields  the  desired  result 

I(Xi,  X ; Y)  ■ \ log  (1  + Pi-^  ) * C12  (1.4.11) 

The  achievabi 1 i ty  of  <?  can  be  proved  by  showing  that  the  extreme 

points  A and  B in  figure  1.8  are  achievable  since  time-sharing  then 

yields  any  point  on  the  line  connecting  A and  8.  The  whole  region  <? 

is  obtained  by  noting  that  information  can  be  thrown  away,  i.e.  if 

( R j , R ) is  achievable,  so  is  ( R i - e i , R - r ) for  Ei , e\  > 0. 

The  following  coding  scheme  attains  a rate  pair  corresponding  to 

point  A,  and  a similar  system  will  achieve  rates  corresponding  to 

NR  • 

point  B.  For  use  at  transmitter  i,  i = 1,2,  we  choose  2 i independent 
codewords  at  random  in  the  usual  way  according  to  a Gaussian  distribution 
with  mean  0 and  variance  P^ ' = P-j  - n where  0 can  be  chosen 

arbitrarily  small.  The  receiver  first  attempts  to  decode  the  message 
W?  from  the  second  transmitter.  In  so  doing,  it  treats  the  signal 
X j as  Gaussian  noise  of  variance  P,'.  Because  of  the  independence 
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of  X | and  V,  the  effective  noise  observed  by  the  receiver  in  decoding 

W is  Gaussian  noise  of  variance  (Pi 1 + a7) . From  the  coding  theorem 

for  Gaussian  channels,  we  know  that  Wo  can  be  reliably  decoded  if 
1 Pl 

R • 2 log  (1  + p + g?) . Once  W2  (and  hence  X2)  has  been  determined, 
the  receiver  can  subtract  X2  from  the  received  vector  Y to  obtain 
Y - X = X,  + V.  But  for  this  situation,  we  know  that  reliable  trans- 
mission from  the  first  transmitter  is  possible  if  R[  < Cj  = 1°9 

(1  + P ).  This  completes  the  proof  that  point  A is  achievable. 
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1.5  SOME  USEFUL  INFORMATION  THEORETIC  INEQUALITIES 

To  conclude  this  introductory  chapter,  we  recall 

several  in- 

equalities  which  will  be  extensively  used  in  the  next 

chapter. 

Definition  1.3:  The  sequence  of  random  variables 

Xi  ^ where 

n _ 3 is  said  to  be  a Markov  chain  if  (X  , X , ..., 

Xj  ^ ) and 

( X j + 1 , ...,  Xp)  are  conditionally  independent  given 

Xj , 2 _ j _ n-1  . 

Lemma  1 . 1 If  X,  Y,  Z is  a Markov  chain,  then 

H(Z  X,  Y)  = H(Z | Y) 

(1.5.1  a) 

and  H ( X Y , Z)  = H(X;Y) 

Proof: 

H ( Z | X , Y ) ^ Z p(x,y,z)  log  , , v 

x,y,z  1 

(1.5.1  b) 

j 

(1.5.2) 

where  the  sum  is  over  all  x>  S , y(  y , z 3 . Recall  script  letters 

denote  sample  spaces. 

1 

Since  X,  Y,  Z is  a Markov  chain,  we  have 

! 

pU|x,  y)  = p(z|y) 

(1.5.3) 

i 

Substituting  (1.5.3)  in  (1.5.2),  we  obtain 

H(Z|X,  Y)  = y>.2  p(y,z)  log  p(z|y) 

(1.5.4) 

= H(Z | Y) 

(1.5.5) 
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j 

This  establishes  (1.5.1  a).  A similar  proof  using  p(x|y,  z)  = p(x 
yields  (1.5.1  b). 

Lemma  1.2  (Data  Processing  Theorem) 

Let  X,  Y,  Z form  a Markov  chain.  Then 

I (X;  Z)  < I ( X ; Y)  (1.5.6) 

Proof:  See  Gallager  [3,  theorem  4.3.3], 

Lemma  1 .3  (Fano's  Inequality) 

Let  U,  V be  random  variables  which  take  values  in 
number  of  elements,  M,  in  U is  finite.  Let 

X = Pr  ill  t VI. 

Then 

H(U | V)  i h(A.)  + a log  (M-l ) 
where  h(x)  = -A  log  A - (1-a)  log  (1-a) 

is  the  binary  entropy  function. 

Proof : See  Gallager  |3,  theorem  4.3.1]. 

We  shall  also  need  the  following  version  of  Fano's  inequality 
which  deals  with  the  average  bit  error  probability. 


ll  , where  the 

(1.5.7) 

(1.5.8) 

(1.5.9) 
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Definition  1.4:  Suppose  a.  sequence  vL  = 

used  to  approximate  a sequence  u*-  = (uj,  u , 

then  we  say  that  an  error  has  occurred  in  the 

the  probability  of  such  an  error  by  P .We 

6 , < 

error  probability  (P  ) by 

0 

L 

<Pe>  ' l Jl  Pe,t 

Lemma  1 .4  (Fano's  Inequality) 

Let  the  coordinates  of  U*"  and  take  values  in  the  set  1j 
which  has  cardinality  M.  Let  (Pg)  be  defined  as  in  (1.5.10).  Then 

1 h(ul| vL)  : h ( <Pe)  ) + <Pe)  log  (m-i)  (1.5.11) 

Proof:  See  Gallager  |3,  theorem  4.3.2} 


(v,  , V vL)  is 

. . . , u.  ) . If  u t v , 

L x 

nth  digit  and  we  denote 
define  the  per  symbol 


(1.5.10) 
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CHAPTER  2 


I 

THE  GAUSSIAN  WIRETAP  CHANNEL 


2.1  INTRODUCTION 

Traditional ly,  information  theory  has  mainly  been  concerned  with 
the  maximum  rates  at  which  reliable  communication  is  possible.  Aside 
from  an  early  paper  by  Shannon  [18|,  privacy  and  security  problems  had 
been  given  very  little  attention  in  the  information  theoretic  literature. 
Until  recently  these  problems  were  considered  mainly  from  a cryptographic 
viewpoint . 

In  a recent  paper  1 1 9 J , Wyner  formulated  a problem  in  which  privacy 
is  to  be  taken  into  account.  We  begin  this  chapter  with  a review  of 
Wyner's  results  for  discrete  memoryless  wiretap  channels.  We  then 
extend  his  results  to  the  Gaussian  wiretap  channel  and  explicitly 
determine  the  complete  achievable  rate-equivocation  region.  Finally 
some  useful  characterizations  of  a special  class  of  wiretap  channels 
are  examined. 

The  model  which  Wyner  proposed  is  shown  in  figure  2.1.  It  is  a 
form  of  degraded  broadcast  channel  [5],  with  the  novel  change  that  one 
information  rate  is  to  be  maximized  and  the  other  minimized.  The  object 
is  to  maximize  the  rate  of  rel iable  communication  from  the  source  to 
the  legitimate  receiver,  subject  to  the  constraint  that  the  wiretapper 
learns  as  little  as  possible  about  the  source  output.  The  wiretapper 
knows  the  encoding  scheme  used  at  the  transmitter  and  the  decoding 
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scheme  used  at  the  legitimate  receiver,  and  is  kept  ignorant  solely 
by  the  greater  noise  present  in  his  received  signal.  Thus,  while  the 
objective  is  the  same  as  in  cryptography,  the  technique  used  to  achieve 
privacy  is  very  different. 

The  source  is  stationary  and  ergodic,  and  has  a finite  alphabet. 

k N 

The  first  k source  outputs  J are  encoded  into  an  N-vector  x 

which  is  the  input  to  the  main  channel.  The  legitimate  receiver  makes 
~ k k N 

an  estimate  J of  s based  on  y , the  output  of  the  main  channel, 
incurring  a block  error  rate 

Pg  = Pr  ( 4k  f ? } . (2.1.1) 

N 

y is  also  the  input  to  the  wiretap  channel  and  the  wiretapper  has 

k • N 

average  residual  uncertainty  H(  A ,Z  ) after  observing  the  output, 

N 

z of  the  wiretap  channel.  Of  course,  the  problem  remainc  unchanged 

w M 

if  z is  the  output  of  a single  channel  with  input  x and 

statistically  equivalent  to  the  cascade  of  the  main  and  wiretap  channels, 

N N 

since  dependencies  between  y and  z are  immaterial.  This  is  not 
necessarily  the  case  if  feedback  is  allowed  (see  Chapter  3). 

We  define  the  fractional  equivocation  of  the  wiretapper  to  be 

A = H ( /|ZN)  / H ( A k)  , (2.1.2) 

and  the  rate  of  transmission  to  the  legitimate  receiver  to  be 

R = H(  Ak)  / N . (2.1.3) 

Note  that  A = 1 implies  that  the  wiretapper's  a posteriori 
uncertainty  about  the  source  output  is  equal  to  his  a priori  uncertainty. 


23 


J 


Thus  when  a = 1,  the  wiretapper  is  no  better  informed  after  he  receives 
his  data  than  he  was  before.  We  shall  say  that  the  pair  (R*,  d*)  is 
achievable  if  for  all  o,  there  exists  an  encoder-decoder  pair  such 

that 


R > R*  - t (2.1.4  a) 

• . d*  - (2.1.4  b) 

Pe  _ t (2.1.4  c) 

Our  definitions  are  slightly  different  from  Wyner's  original  definitions. 
For  example,  Wyner  defines  = H ( & Z)  / k.  (We  will  drop  superscripts 

when  context  permits).  The  new  definitions  are  used  because  they  allow 

somewhat  simpler  theorem  statements  and  proofs. 

The  main  objective  is  to  characterize  the  set  of  achievable  (R,  d) 
pairs.  In  order  to  develop  some  intuition  for  this  problem,  consider 
the  simple  example  depicted  in  figure  2.2. 


WIRETAPPER 


Figure  2.2  - SIMPLE  WIRETAP  CHANNEL 


Here  the  source  is  binary  symmetric,  that  is,  successive  source  outputs 
are  independent  and  equiprobable.  The  main  and  wiretap  channels  are 
binary  symmetric  channels  (BSC)  with  crossover  probabilities  of  0 and 
respectively.  We  now  analyze  two  very  simple  schemes  [19]  which  might 
be  used. 

Scheme  1:  Let  k = N = 1 and  X = S.  Then  the  transmission 

rate  R is  1 and  since  the  main  channel  is  noiseless,  Pg  = 0 . The 
equivocation  at  the  wiretapper  is  A = H(XjZ)  = h(e)  where  h(-) 
is  the  binary  entropy  function.  Thus,  this  scheme  achieves  (R,  d)  = 

(1,  h ( ) ). 

Scheme  2:  Here  we  let  k = 1 and  N be  arbitrary.  Thus  we 

are  sending  one  of  two  messages  in  N channel  uses.  We  partition 

binary  N-space  into  two  subsets:  CQ  consisting  of  all  N-vectors  with 

an  even  number  of  l's  and  C]  consisting  of  all  N-vectors  with  an 
odd  number  of  l's.  If  the  source  message  is  0,  the  encoder  outputs 
a vector  chosen  totally  at  random  from  C . Otherwise,  the  encoder 
output  is  a randomly  chosen  vector  from  Cj.  Because  the  main  channel 
is  noiseless,  the  legitimate  receiver  can  recover  the  source  message 
from  XN  perfectly  so  that  Pg  = 0.  It  can  be  shown  that  H(S]Z^)  = 
h (?  “ 2 (1  ~ 2e)^)  which  tends  to  1 as  N > °°.  This  means  that  as 
N -»  , the  wiretapper  becomes  totally  confused.  However,  as  N ► ■» 

the  transmission  rate  R = ^ ■+  0.  Thus  the  question  arises  as  to 
whether  it  is  possible  to  transmit  reliably  at  a positive  rate  and  yet 
at  the  same  time  completely  foil  the  wiretapper. 


Wyner  has  determined  the  complete  achievable  (R,  d)  region  when 
both  channels  are  discrete  memoryless  channels.  He  shows  that  in  most 
cases  there  is  a secrecy  capacity  C$  0 such  that  (R,  d)  = ( C , 1) 
is  achievable.  Thus,  it  is  possible  to  reliably  transmit  information 
to  the  legitimate  receiver  up  to  a rate  Cs  in  essentially  perfect 
secrecy.  More  generally,  we  have 

Theorem  2.1  (Wyner). 

Let  p^(-)  be  a probability  distribution  on  the  input  of  the  main 
channel.  Define  p(R),  R _ 0 to  be  the  set  of  p^  such  that 
I (X;  Y)  _ R.  For  0 _ R < C^,  where  is  the  capacity  of  the  main 

channel , let 

f (R)  = max  I (X;  Y]Z)  (2.1.5) 

Px  e p(R) 

Then  the  set  of  all  achievable  (R,  d)  pairs  is  given  by 

= i ( R , d ) ; 0:R  . CM,  0 < d ■_  1 , R d _ i (R)  ■ (2.1.6) 

if  both  channels  are  discrete  and  memoryless. 

For  the  example  of  figure  2.2,  application  of  theorem  2.1  shows 
that  the  set  of  achievable  points  is  defined  by 

R < 1 (2.1.6  a) 

d _ 1 (2.1.6  b) 

R d h(>  ) (2.1.6  c) 
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For  more  details,  see  Appendix  I.  This  result  implies  that  scheme  1 
which  achieves  R = 1,  d = h(e)  is  optimal  in  that  for  R = 1,  the 
maximum  achievable  equivocation  is  h(L).  On  the  other  hand,  scheme  2 
which  achieves  R = 0,  d = 1 is  not  optimal  since  in  fact  for  d = 1, 
it  is  possible  to  transmit  up  to  a rate  R = h(i.). 

The  family  of  achievable  (R,  d)  points  in  (2.1.6)  is  sketched  in 
figure  2.3. 


Figure  2.3  - COMPLETE  ACHIEVABLE  (R,  d)  REGION  FOR 
A SIMPLE  WIRETAP  CHANNEL 

As  noted  by  Wyner,  this  region  is  not  convex.  Surprisingly,  however, 

Rd  = c as  in  (2.1.6  c)  corresponds  to  a time  sharing  curve  as  established 
in  the  following  lemma. 


Lemma  2 . 1 


Let  R^  = R^dp  = c,  a constant.  Assume  R?  and  hence  d,, 

If  the  points  (R^,d^)  and  (R2,d2)  are  achievable  then  through  time- 
sharing any  point  (R,d)  with  R2  R R-j  , d^  d and  Rd  = c 

is  achievable. 


Proof: 

Consider  a block  of  N channel  uses.  Assume  that  for  xN  trans- 
missions we  operate  at  (R^,d^)  and  for  ( 1 - 1 ) N transmissions  we 
operate  at  (R^d^).  Then  the  effective  equivocation  is 


.NR]d1  + (l-ct)NR2d2 
xNF^  + (1  -T)  NR“ " 


(2.1.7) 


The  effective  transmission  rate  is 

R = | ,NR]  + (1-0NR2I/N  (2.1.8) 

so  that 

Rd  = .R1d1  + (l-.)R2d2.  (2.1.9) 

We  see  that  time  sharing  averages  the  product  R^d.  and  if  R^d^  = 

R2d2  = c then  Rd  = c.  □ 

This  lemma  will  be  of  importance  in  establishing  the  achievable 
(R,d)  region  for  the  Gaussian  wire-tap  channel,  shown  in  figure  2.4. 

As  before,  the  source  is  a stationary,  ergodic,  finite  alphabet  source. 
The  noise  vectors  n^,  and  n2^  are  independent  and  have  components 
which  are  independent  identically  distributed  (i.i.d.)  71(0,  ^- ) and 
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Jl(o,  ■),  i.e.  normal  random  variables  with  mean  zero  and  variances 
and  respectively.  The  channel  is  power  limited  in  that 

N 

(1/N)  5:  E(X/)  < P.  (2.1.10) 

i = l 

In  the  following  two  sections  we  utilize  Wyner's  framework  to 
completely  characterize  the  achievable  (R,d)  region  for  this 
Gaussian  wire-tap  channel.  Letting 

CM  = 1/2  log  (1  + P/aj2)  (2.1.11) 


and 

CMW  = 1/2  log  (1  + P/(oj:  + o22)  ) (2.1.12) 

denote  the  capacities  of  the  main  and  overall  wire-tap  channels  re- 
spectively, we  shall  show  that  the  secrecy  capacity  is  given  by 


f = r -f 
s UMW 

and  in  general,  we  have  the  following  result. 


(2.1.13) 


Theorem  2 . 2 

For  the  Gaussian  wiretap  channel  the  set  % of  all  achievable 
(R,d)  pairs  is  defined  by 

R < CM  (2.1.14) 

d < 1 (2.1.15) 

Rd  < Cs.  (2.1.16) 
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In  the  next  section  we  establish  the  achievabi 1 i ty  of  this  region 
by  showing  that  the  extreme  points  of  A 


(Rl»d-|)  ( C^,  Cs/C|y|)  (2.1.17) 

and 

( » d 2 ) ” (cs>  (2.1.18) 

are  both  achievable.  We  then  invoke  the  time  sharing  argument,  and 
say  that  since  R-| d^  = R^d^  = Cs  the  entire  region  A defined  in 
theorem  2.2  is  achievable. 

In  section  2.3  we  establish  the  converse,  that  any  point  outside  A 
is  not  achievable. 
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2.2  DIRECT  HALF  OF  THEOREM  2.2 


As  noted  above  we  need  only  establish  that  the  two  points  ( R ^ , d ^ ) 
and  (R^d^)  defined  by  (2.1.17)  and  (2.1.18)  are  achievable,  since 
time  sharing  then  implies  the  achievabi 1 i ty  of  the  entire  region  of 
theorem  2.2. 

The  point  (R^,d^)  is  trivially  achieved  by  coding  as  if  the  wire 

tapper  were  absent.  Using  usual  source  and  channel  coding  arguments  it 

is  possible  for  R to  be  arbitrarily  close  to  CM  = R,  and  P to  be 

M 1 e 

arbitrarily  close  to  0.  But  the  information  gained  by  the  wiretapDer 
is  limited  by  the  capacity  of  his  channel  so  that 

A = H(  Ak  ZN)/H(  Ak) 

_ dk)  - ncmw)/h{  ^k) 

= i -(cmw/r)  (2.2.1) 

and  as  R approaches  C^,  this  lower  bound  on  a approaches 
Cs/C(v]  = d^ . Thus  the  point  (Rpd^)  is  achievable. 

We  will  establish  the  achievabi  1 i ty  of  (R^d^)  = (Cs,l)  by 
proving  a somewhat  stronger  result,  similar  to  that  of  Heilman  and 
Carleial  1 20 1 . If  Cs  = C^/2  theorem  2.2  states  that  by  cutting  our 
rate  in  half  we  can  completely  foil  the  wire  tapper.  Instead,  we  will 
show  that  it  is  possible  to  reliably  send  two  independent  messages, 
each  at  a rate  near  Cs=  C^/2,  and  each  totally  protected  from  the 
wire  tapper  on  an  individual  basis.  The  penalty  is  that  if  the  wire 
tapper  learns  one  message  through  other  means  he  can  then  also  determine 
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the  other  message.  In  general,  if  Cs  j_  C^/n  we  will  show  that  n 
independent  messages  can  be  simultaneously  and  reliably  communicated 
to  the  legitimate  receiver,  each  at  a rate  near  C^/n,  and  each 
totally  protected  on  an  individual  basis.  However  if  the  wire  tapper 
learns  any  one  message  he  may  be  able  to  determine  all  of  the  others. 

By  using  random  noise  for  all  but  the  first  message  we  can  obtain  the 
direct  half  of  theorem  2.2  as  a special  case  of  theorem  2.3. 

Theorem  2.3 

Let  um  be  a sequence  of  m outputs  from  a finite  alphabet, 
stationary,  ergodic  source  with  per  letter  entropy  H(b),  and  let  iP 
be  any  p consecutive  components  of  um.  Then  provided 

Rt  - H(  l)m)/N  < CM  (2.2.2) 

Rs  = H(  ^ P)/N  < Cs  (2.2.3) 

it  is  possible  by  choosing  N large  enough,  to  reliably  communicate  um 
to  the  legitimate  receiver  in  N channel  uses  and  yet  to  ensure  that 

&s  = H(  AP|ZN)/H(  *P)  (2.2.4) 

is  arbitrarily  close  to  1. 

Further  if  ( iP^t^  are  K such  consecutive  p-tuples  of 
um  it  is  possible  to  ensure  that 

Asi  h H(  MP.|ZN)/H(  ^Pi>  (2.2.5) 

is  arbitrarily  close  to  1 for  l^iH<,  with  K fixed  as  N » 
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Remarks : 


1.  An  alternative  notation  would  be  to  use  A m in  place  of  lim,  but 
then  superscripts  would  be  necessary  to  distinguish  between  the  entire 
ergodic  source  output  and  its  projection,  >p,  to  be  kept  secret.  This 
remark  will  hopefully  remove  any  confusion  caused  by  our  choice  of 
notation. 

2.  Until  now,  the  entire  sequence  Um  was  to  be  protected  so  that  m 
equalled  p,  R$  equalled  R^,  and  A$  equalled  A^..  Now  the 
distinction  between  li"1  and  its  projection  ap  requires  us  to 
distinguish  between  the  total  rate  Rt>  and  the  secrecy  rate  R . 

3.  For  memoryless  sources  the  p outputs  in  ,*p  need  not  be 
consecutive,  and  more  general  p - dimensional  projections  are  pos- 
sible. See  1 20]  for  a generalization  to  linear  projections. 

Theorem  2.3  will  be  established  by  proving  a sequence  of  lemmas. 
First,  since  the  source  is  ergodic  we  have: 

Lemma  2.2 

It  is  possible  to  reliably  source  code  um  into  a binary  n-vector 
un  such  that  each  of  the  4P.  are  reliably  determined  (i.e.  as  N *•  » 

the  probability  of  error  tends  to  0)  by  k consecutive  components 

k , n 
s.j  of  u and 


with  ■ 0. 


Remark : 

Script  um  denotes  the  entire,  ergodic  source  output,  and  script 
denotes  a p-dimensional  projection  thereof;  un  denotes  the 

m 1/ 

binary  source  coded  version  of  u and  s^  denotes  a k-dimensional 
projection  thereof.  Further  s is  a binary,  source  coded  version  of 
*P. 

Proof:  From  (2.2.2)  and  (2.2.3),  we  can  define 

£ = min  l(CM-Rt)/3,  (Cs-Rs)/2}  0 (2.2.8  a) 

In  proving  that  R and  R.  can  be  made  to  approach  C and  C,.  while 
st  s M 

's  is  kept  arbitrarily  close  to  1 , we  can  redefine  Rs  and  Rt  so  that 

(CM-Rt)/3  = (Cs-Rs)/2  = £ (2.2.8  b) 

where  is  given  by  (2.2.8  a),  since  excess  rate  can  be  discarded. 

The  noiseless  source  coding  theorem  for  ergodic  sources  [3,  theorem 

3.5.31  then  implies  that  (2.2.6)  and  (2.2.7)  can  be  satisfied. 

|/ 

There  is  a minor  problem  in  ensuring  that  s consists  of  k consecutive 
bits  of  un,  but  this  is  easily  overcome. 

If  the  (4P. } are  disjoint  we  can  clearly  code  in  sub-blocks 
while  satisfying  (2.2.7)  and  the  condition  on  s^  being  consecutive 
bits  of  u.  Even  if  the  I 4 13  ) are  not  disjoint  we  can  still  satisfy 
these  conditions.  For  example  if  iP-|  constitutes  the  first  3/4  of  um 
and  JP2  constitutes  the  last  3/4  of  um  we  can  code  um  in  four 


35 


equal  sub-blocks  to  obtain  un.  The  union  bound  guarantees  that  the 
overall  coding  from  u to  u is  reliable  since  each  of  the  four  sub- 
codings is  reliable.  □ 


We  will  henceforth  deal  with  only  one  of  the  s^  (or 
which  we  shall  denote  as  s (or  s ) . We  shall  show  that,  over  a 
suitable  ensemble  of  codes,  u can  be  reliably  communicated  to  the 
receiver  and  A$  kept  arbitrarily  near  1,  with  probability  which  ap- 
proaches 1 as  N k «>.  Use  of  the  union  bound  then  allows  us  to  state 
that  with  probability  approaching  1,  all  K A . can  be  kept  near  1. 
Now  define  an  ensemble  of  channel  codes  as  follows.  Each  code  in 
the  ensemble  has  2n  codewords  with  blocklength  N, 


1 2 2 

X1 , X , . . . , X 1. 


(2.2.9) 


Each  component  of  each  codeword  is  an  i.i.d.  random  variable  with  a 
Jl(0,  P-n)  distribution,  where  n 0 is  chosen  so  that 


CM(n)  1/2  log  (1  + (P-n)/o^  ) > CM  - e (2.2.10a; 


and 


2,2 


CMW(n)  1/2  log  (1  + (P-n)/(of + 0^))  > CMW  - e.  (2.2.10  b) 


Since  n = N(C^-2  ),  the  normal  coding  theorem  for  Gaussian  channels 
| 3,  theorem  7.4.2]  states  that  un  is  reliably  transmitted  to  the  re- 
ceiver by  almost  all  codes  in  the  ensemble  as  N * ■.  And  as  N - ■ 
almost  all  codes  in  the  ensemble  satisfy  the  power  constraint  (2.1.10) 
so  that  almost  all  codes  satisfy  both  conditions  as  N ► ■. 
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All  that  remains  is  to  show  that  A - 1 for  almost  all  codes  in 
the  ensemble. 

Lemma  2.3 

A$  > (H(U)  - H(U | S,Z)  - I(U;Z)]/NCS  (2.2.11) 

Proof:  Since  s is  a deterministic  function  of  i 

As  = H(4|Z)/H(  4 ) (2.2.12) 

>H(S|Z)/H(4)  (2.2.13) 

and  using  (2.2.3) 

H(  4 ) < NC  . (2.2.14) 

We  complete  the  proof  by  showing  that 

H(S|Z)  = H(U)  - H(u;S,Z)  - I (U ;Z ) . (2.2.15) 

By  definition 

H(U|Z)  = H(U)  - I (U;Z)  (2.2.16) 

and  since  s is  a function  of  u 

H(U|Z)  = H(U,S|Z)  = H(S|Z)  + H(U | S , Z ) . (2.2.17) 

□ 

We  now  proceed  to  bound  the  three  terms  in  (2.2.11). 

Lemma  2.4 

There  exists  a sequence  of  source  codes  of  increasing  blocklength 
such  that 


37 


where  ■'  stands  for  any  term  which  tends  to  0 as  c-0,  and  6 
stands  for  any  term  which  tends  to  0 as  N ■*  <*•  with  r>0  fixed. 

Proof: 

From  (2.2.2)  and  (2.2.8) 

H ( lJ  ) = NR 

= N(Cm  - 3c) 


= NCm(1  -c').  (2.2.19) 

Since  u is  a deterministic  function  of  u 

H(U)  = H(  -U  ) - H(1J  |U).  (2.2.20) 

Using  the  noiseless  source  coding  theorem  for  ergodic  sources  [3,  theorem 
3.5.3|  and  Fano's  inequality  (lemma  1.3) 

H ( tJ  |U)  1 + 6N  , (2.2.21  ) 

so  that 

H(u)  > NCm(1-c’  )-  1 - <5N 

= NCM(1 -e* -6) . (2.2.22) 


(Note  that  the  two  6's  are  not  equal).  □ 

We  now  bound  the  second  term,  H(U|S,Z),  in  (2.2.11). 
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Lemma  2.5 


As  N ->  » almost  all  codes  in  the  ensemble  obey 

H(U|S,Z)  < 6N  (2.2.23) 

Proof: 

Since  s^  is  a k-dimensional  projection  of  the  n-vector  u , given 
there  are  only  2n  ^ = 2^^MW”  ^ u's  to  be  decided  among  on  the 
basis  of  z^  But  the  codewords  associated  with  each  of  these  u's  was 
chosen  according  to  the  capacity  achieving  distribution  and  from  (2.2.10  b) 
we  know  the  error  probability  given  s^  and  z tends  to  0 as  N + °° 
for  almost  all  codes.  Use  of  Fano's  inequality  completes  the  proof. 

Finally,  use  of  the  data  processing  theorem  (lemma  1.2)  and  the 
definition  of  C^,  yields  a bound  on  the  third  term  in  (2.2.11). 

I(U;Z)<NCmw.  (2.2.24) 

Combining  (2.2.24)  and  the  preceding  three  lemmas  we  find  that 
for  almost  all  codes 

A$  1 [NCm  (1-e’-5)  - 6N  - NCmw1/NCs 
= NCs  (l-e'-6)/NCs 

= 1 -e' -6.  (2.2.25) 

Then  letting  N ■>  « with  fixed  c > 0 we  find  that 

11m  A > 1 -e'  , (2.2.26) 

N v ® S 
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and 


1 im  1 im  A = 1 
e ' 0 N *•  oo 


(2.2.27) 


for  almost  al 1 codes. 


is  completes  the  proof  that  (R^,  d£)  = (C^,  1)  is  achievable. 


* 


m 


2 . 3 CONVERSE  THEOREM 

In  this  section  we  prove  the  converse  part  of  theorem  2.2,  that 
any  point  (R,d)  outside  K is  not  achievable.  That  R and 

d 1 is  self  evident  from  the  definitions.  Our  real  task  is  to  show 
that  (2.1.16)  must  hold 

R d C$  (2.1.16) 

if  P is  arbitrarily  close  to  0.  (Note  that  in  this  section  we  are 
dealing  solely  with  s,  and  not  at  all  with  u of  the  last  section. 

We  can  therefore  use  R in  place  of  R$  and  A in  place  of  A$ 
without  ambiguity.)  The  proof  of  the  following  theorem  is  therefore 
the  goal  of  this  section. 

Theorem  2_.4 

Wiht  R,  A and  Pg  defined  as  in  (2.1.1)  - (2.1.3) 


R 


F 


k Pe  log(v)  + h(Per 
H(  ik) 


(2.3.1) 


where  is  the  size  of  the  source  alphabet  and  Cs  is  defined  by 
(2.1.13). 

If  instead  the  per  digit  error  rate 


k 

<PJ  = 1/k  E Pr  ( a.  f S .)  (2.3.2) 

e 1=1  1 1 

is  used  (2.3.1)  becomes 
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k[h((P  ))  + (P  ) log  (v-1 )] 

R A - £ < C (2.3.3) 

H (<$  ) S 


Thus  the  use  of  this  more  lenient  measure  of  reliability  would  not 
expand  the  region  $ . 


Remark : 

If  the  probability  of  error  at  the  legitimate  receiver  is  not 
required  to  be  arbitrarily  small,  theorem  2.4  can  still  be  used  to 
provide  an  outerbound  on  the  achievable  (R,  d)  region. 

The  proof  of  this  theorem  will  be  established  through  a sequence 
of  lemmas. 


Lemma  2.6 


k P log  (.)  + h(P  )l  I(XN;YN;zN) 
RN  J - T~ 


(2.3.4) 


R A - - 


M h«P  ))  + (P  ) log  (v-1)  | I(X;Y|Z) 


(2.3.5) 


Proof : 

First  note  that  through  use  of  the  data  processing  theorem 
(lemma  1.2)  and  Fano's  inequality  (lemma  1.3) 


i- 


HU|Z,_Y)  H(£|Y) 
H(  A A) 


< h(P  ) + kP  log  ( j)  . 

c c 


(2.3.6) 


Then  from  the  definitions  (2.1.2),  (2.1.3)  of  R and 


RN  A = H(  A |Z) , 


Using  (2.3.6) 


RN  A ^ H(  j Z)  - H ( £|Z,Y)  + h(Pe)  + kPe  log  (v) 


I(  i ;Y|Z)  + h(Pe)  + kPe  log(  ). 


(2.3.7) 


Since  A,  X,  Y,  Z form  a Markov  chain  the  data  processing  theorem 
imp! ies 


l(A  ;Y|Z)  < I(X;Y|Z). 


(2.3.8) 


RN  A _<  I (X;Y|Z)  + h(Pe)  + kPe  1 og ( 


(2.3.9) 


which  with  minor  algebra  establishes  (2.3.4).  Equation  (2.3.5)  is 
established  in  exactly  the  same  manner,  except  using  the  per  digit 
error  rate  version  of  Fano's  inequality  (lemma  1.4). 


Lemma  2.7 


I ( X ; Y | Z ) = £ log  ( + - 


) - |H(Z)  - H ( Y ) 1 (2.3.10) 
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Proof: 

While  the  entropy  of  a continuous  random  variable  is  lacking  in 
physical  significance,  if  we  define 

H ( A)  = - J p ( a ) log  [ p ( a ) | da  (2.3.11) 

it  is  known  that  differences  in  entropy  are  still  physically  meaning- 
ful as  mutual  information  (e.g.,  H ( A ) - H ( A | B ) = I(A;B),  see  (21  | for 
a full  development).  We  may  thus  write 

I(X;Y|Z)  = H(X|Z)  - H(X  Y , Z)  = H(XjZ)  - H(X_|Y)  (2.3.12) 

since  X is  independent  of  Z,  conditioned  on  Y.  Using 

H(A,B)  = H(A)  + H(B;A)  = H(B)  + H(A  B)  (2.3.13) 

(2.3.12)  becomes 

I(X;Y|Z)  = |H(X)  + H(Z | X)  - H(Z)  | - |H(X)  + H(YlX)  - H(Yl] 

= H(Z | X)  - H(Y | X)  - | H(Z)  - H( Y) ] . (2.3.14) 

Because  the  channel  is  memoryless 

N 

H(Y|X)  = X H ( Y . | X . ) = (N/2)  log  (2  e ) (2.3.15) 

i = l 11 

where  the  last  expression  comes  from  integration  as  in  (2.3.11) 

[3,  p . 32 | . Similarly 

H(Z  | X)  = (N/2)  log  [2ne(M  + )|.  (2.3.16) 

Substituting  (2.3.15)  and  (2.3.16)  into  (2.3.14)  yields  (2.3.10).  Q 


L L _ 
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Lemma  2.8 


Define 


g(P)  = 1/2  log  (2neP),  P>0 


g’1  (a)  = (l/27ie)  e2‘ 


A ( v ) = g 


a22  + g_1(v)  j -v 


Then  A (v)  is  monotonical ly  decreasing  in  v. 


Proof: 


r 

A(v)  = 1/2  log  | 2-r e(o2?  + e2v) 


-v . 


Differentiating  (2.3.20)  yields 


dA 

dv 


e2v/(2nea22  + e2v)  I - 1 < 0. 


(2.3.17) 

(2.3.18) 

(2.3.19) 


(2.3.20) 


(2.3.21) 


Lemma  2 . 9 

H(Y)  - Ng(P  + 0l?)  = (N/2)  log  |2ne(P  + ot2)]  (2.3.22) 

Proof : 

From  the  ordinary  converse  to  the  coding  theorem  [3],  we  know 

that 

I ( X;Y)  = H(Y)  - H(Y|X)  < NCM  (2.3.23) 

or 
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+ (N/2)  log  (2  e . ) 


H(Y)  (N/2)  log 

(P  + )/  V 

= (N/2)  log 

2 ne  ( P + i]  ) 

(2.3.24) 


Lemma  2 . 1 0 
If 

H(Y)  = Nv  (2.3.25) 

then 

H(Z)  - H(Y)  _•  NA(v)  = N g|y  + g_1  (v)  | -Nv.  (2.3.26) 


Remark: 

This  lemma  follows  from  Shannon's  convolution  inequality  for 
entropy  powers  which  states  that  the  entropy  power  of  the  sum  of  two 
independent  random  processes  is  at  least  the  sum  of  their  entropy  powers. 
Wyner  and  Ziv  (22 1 have  obtained  an  analogous  result  for  the  binary 
case,  known  as  Mrs.  Gerber's  Lemma. 


Proof: 

See  Shannon  |1,  Theorem  15 |,  Blachman  | 23 | and  Bergmans  |9|. 
Combining  lemmas  2.8,  2.9  and  2.10  we  see  that 


H(Z)  - H(Y)  _ N A | g(P  + a,; )| 


r -| 

N gj  cv  • + g_1g(P  + tj  ) j 


- Ng(P  + M ) 


= (N/2)  log 


P + a!2  + 


P + a 


(2.3.27) 
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Using  (2.3.27)  with  lemma  2.7  yields 


I(X;Y|Z)  | log  ^ ^ f log  ^ 


P + tj[  ■ + a. 


P + a j ' 


= (N/2)  log 


°i 


2 + P 


(N/2)  log! 


(o  ] 2 + 0 + P ' 


Ol‘ 


Oi  + 02 


N^CM  " CMW^ 


= NC$, 


(2.3.28) 


which  together  with  lemma  2.6  completes  the  proof  of  theorem  2.4. 


2.4  DISCUSSION 


2.4.1  THE  6AUSSIAN  WIRETAP  CHANNEL 

Figure  2.5  show  a sketch  of  the  set  * of  all  achievable  (R,d) 
pairs  defined  in  the  statement  of  theorem  2.2. 


CM  R 

Figure  2.5  - ACHIEVABLE  REGION  FOR  THE 
GAUSSIAN  WIRETAP  CHANNEL 


(2.4.1) 

(2.4.2) 

(2.4.3) 
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Theorem  2.2  shows  that  C$  almost  completely  characterizes  the  achievable 
(R.d)  region.  It  is  easy  to  see  that  C$  increases  monotonical ly  as 
the  power  constraint  P is  relaxed.  The  limiting  value  of  C for 

1 On2 

very  large  P is  ^ ^°9  (1  + ^~r)  • 

In  the  power  limited  region  (P  ■<  a2) 

CM  = P/(al-  2<?n  2)  (2.4.4) 

CMW  4 P/I («r  + o2?)  22n  2 ] (2.4.5) 

and 

Cs/CM  4 °2'/('i'  + o2')-  (2.4.6) 

In  the  bandwidth  limited  region  (P  o?) 

CM  ± 1/2  log  (P/o!?)  (2.4.7) 

CMW  ± 1/2  log  | P/ ( it-j 2 + a 2 ' ) | (2.4.8) 

so  that 

Cs  - 1/2  log  [(aj2  + j2')/ar  | (2.4.9) 

and 

CS/CM  * 0.  (2.4.10) 

Our  results  are  therefore  of  most  use  on  power  limited  channels.  Of 
course  if  the  main  channel  is  bandwidth  limited  (P/^  1)  and  the 

wiretap  channel  is  power  limited  (P/(  > + o2?)  1)  then  Cs/CM 

will  be  even  closer  to  1 . It  is  really  only  the  wiretap  channel  which 
must  be  power  limited. 
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A certain  amount  of  caution  should  be  exercised  when  using  theorem 
2.2.  Our  results  were  derived  on  the  assumption  that  we  had  perfect 
knowledge  of  the  main  channel  and  wiretap  channel  noise  statistics. 

In  practice  the  signal  to  noise  ratios  (SNR)  on  the  channels  may  be 
somewhat  uncertain.  In  this  case  the  system  may  be  operating  several 
dB  below  the  actual  capacity  of  the  main  channel,  and  if  the  wiretap 
channel's  SNR  is  less  than  this  amount  below  the  main  channel's 
SNR  then  secrecy  is  lost. 

But  in  spite  of  these  problems  there  may  be  practical  applications 
for  these  results.  If,  for  example,  the  wiretapper  is  listening  to 
unintentional  electromagnetic  radiation  from  a terminal  or  computer, 
his  SNR  may  be  tens  of  dB  down  from  that  of  the  "main  channel".  Such 
a wiretap  channel  allows  almost  no  reduction  in  rate  of  information 
flow  to  be  coupled  with  high  uncertainty  on  the  part  of  the  wiretapper. 


2.4.2  THE  GENERAL  DISCRETE  MEMORYLESS  WIRETAP  CHANNE L 

As  mentioned  in  the  previous  section,  the  complete  set  of  achievable 
(R,d)  pairs  for  the  Gaussian  wiretap  channel  is  characterized  by  the 
secrecy  capacity  C?.  Such  channels  (which  we  will  refer  to  as 
constant  r(R)  channels)  have  the  property  that  time-sharing  can  be 
used  to  obtain  the  whole  achievable  region  from  the  achievabi 1 i ty  of 
the  two  extreme  points.  In  this  section,  we  will  use  Wyner's  basic 
result  for  discrete  memoryless  wiretap  channels  to  deduce  a useful 
characterization  of  constant  r(R)  wiretap  channels.  Examples  of 
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such  channels  and  others  are  given. 

The  reason  for  referring  to  channels  characterized  by  the  secrecy 
capacity  Cs  as  constant  r(R)  channels  follows  from  theorem  2.1.  We 
recall  that 


r(R)  = max /D\ 1 (X;Y 1 Z)  (2.4.11) 

where  Px(‘)  is  a probability  distribution  on  the  input  of  the  main 
channel  and  ^(R)  is  the  set  of  px  such  that  I ( X ; Y ) >_  R.  We  can 
rewrite  I(X;YZ)  as 

I ( X ; Y | Z ) = H ( X ! Z ) - H( X | Y ,Z)  (2.4.12) 

= H ( X | Z ) - H ( X | Y ) (2.4.13) 

= I ( X ; Y ) - I ( X ; Z ) (2.4.14) 

where  in  (2.4.13)  we  have  used  tne  fact  that  X,Y  and  Z form  a 
Markov  chain. 


Theorem  2.5 

★ 

r(R)  is  constant  if  and  only  if  p^  maximizes  I ( X ; Y ) - I ( X ; Z ) 
★ 

where  p^  is  a capacity  achieving  distribution  on  the  main  channel. 
Proof: 

★ 

If  px  maximizes  I ( X ; Y ) - I(X;Z),  then 

r(R)  = I*  ( X ; Y ) - I*  (X;Z)  (2.4.15) 

px  px 

★ 

since  p^  p ( R ) for  0 <_  R ; C^.  Therefore  r(R)  is  a constant. 


51 


w 


★ 

Conversely,  suppose  px  does  not  maximize  I ( X ; Y ) - I ( X ; Z ) . Let 

l 

the  maximizing  distribution  be  denoted  by  pY  , and  let  1 • (X;Y)  = R , . 

x px 

Then,  it  is  clear  that  r(R|)  • r ( ) . This  shows  that  r(R)  cannot 
be  a constant.  □ 

We  have  already  seen  two  examples  of  constant  r(R)  channels, 
namely  the  Gaussian  wiretap  channel  and  the  simple  channel  of  figure  2.2. 
Some  more  examples  are  given  in  Appendix  III.  Here  we  look  at  a channel 
for  which  r(R)  is  not  constant. 


(c) 

Figure  2.6 

(a)  Main  channel 

(b)  Wiretap  channel 

(c)  Cascade  of  main  and  wiretap  channels 
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Referring  to  figure  2.1,  consider  the  case  in  which  the  main  and 
wiretap  channels  are  as  shown  in  figures  2.6(a)  and  2.6(b)  respecti vely. 
It  can  quite  easily  be  verified  that  the  cascade  of  these  two  channels 
is  equivalent  to  a BSC  with  a crossover  probability  e of  1/11  . 

We  can  evaluate  the  mutual  informations  between  X and  Y and  X 

and  Z to  obtain 

I ( X ; Y ) = h(0.1  + 0.9q  ) - (l-q0)  h(0.1)  (2.4.16) 

I ( X ; Z ) = h(1Y  + ^qg)  - h(|y)  (2.4.17) 

where  q^  = Pr  X = 0}  and  h(-)  is  the  binary  entropy  function. 

Figure  2.7  shows  a plot  of  I ( X ; Y ) and  I ( X ; Z ) against  qg  . 

The  maximum  of  I(X;Y)  occurs  at  qn  = 0.54  and  equals  0.76.  By 

definition,  this  value  is  also  the  capacity  of  the  main  channel. 
Because  of  the  symmetry  of  the  channel  from  X to  Z,  I ( X ; Z ) is 
maximized  at  qp=  0.5.  The  interesting  point  to  note  is  that  the 
difference  between  I ( X ; Y ) and  I ( X ; Z ) is  maximized  at  q^  = 0.69 
which  corresponds  to  a value  for  I ( X ; Y ) of  0.71.  The  fact  that 
r(R)  is  not  constant  is  illustrated  in  figure  2.8. 


MUTUAL  INFORMATION 


Figure  2.7  - PLOT  OF  I(X;Y)  and  I (X ;Z ) AGAINST  INPUT 
PROBABILITY  DISTRIBUTION  FOR  EXAMPLE  OF 
FIGURE  2.6 
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Figure  2.8  - r(R)  FOR  THE  WIRETAP  CHANNEL 
EXAMPLE  OF  FIGURE  2.6. 

We  now  give  a necessary  condition  for  r(R)  to  be  constant  for 
a special  class  of  discrete  memoryless  wiretap  channels. 

L_emma  2.11 

Let  the  main  channel  be  a discrete  memoryless  channel  (DMC)  with 

K inputs,  and  the  wiretap  channel  be  some  other  DMC.  Suppose  that 

★ ★ ★ 

I ( X ; Y ) is  maximized  at  a unique  input  distribution  px  = (p  (1),  p (2), 

★ ★ 

. ..,  p (K)  ) where  all  the  components  of  pv  are  strictly  positive. 


★ 

Then  a necessary  condition  for  r(R)  to  be  constant  is  that  p^  should 
be  a maximizing  distribution  for  I(X;Z). 


Proof: 


First  we  note  that  I ( X ; V ) and  I ( X ; Z ) are  both  concave  functions 
of  the  input  probability  assignment  p^  to  the  main  channel  |3,  theorem 


4.4.2|. 

We  can  handle  the  equality  constraint 


K 

. : p(i ) = 1 

i = l 


by  substituting 


K-l 

1 - p(i)  for  p(K)  in  the  expressions  for  1 ( X ; Y ) and  I(X;Z). 
i = l 

Thus  we  can  consider  maximizing  I ( X ; Y ) and  I ( X ; Z ) which  are  now 
functions  of  K-l  variables  subject  only  to  the  inequality  constraints 
P(  i ) j.  0- 


By  hypothesis,  I ( X ; Y ) has  a maximum  at  pY  and  p. 


1 24  | 


0.  Therefore, 


I ( X ; Y ) 
~p(i) 


K - 1. 


(2.4.18) 


★ 

Now  assume  that  p does  not  maximize 
at  least  one  j f. [1,  K-l]  such  that 


I ( X ; Z ) . Then  there  exists 


• I ( X ; Z ) 
^p(j) 


t 0 

★ 

PX 


(2.4.19) 


★ + 

Thus,  by  moving  away  from  px  along  the  direction  of  p(j)  , the 

Note  that  this  is  always  possible  since  we  assumed  that  all  the 
components  of  p*  are  strictly  positive. 

A 
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difference  between  I(X;Y)  and  I ( X ; Z ) can  be  made  to  increase  (at 

★ 

least  initially).  Therefore,  px  does  not  maximize  I ( X ; Y ) - I ( X ; Z ) 
and  by  theorem  2.5,  this  implies  that  r(R)  is  not  constant.  G 

Remark : 

Lemma  2.11  can  be  extended  in  a straight  forward  manner  to  cover 
the  case  where  I (X ; Y ) is  maximized  at  non-unique  but  strictly  positive 
input  distributions. 

Corollary  1 : 

Under  the  assumptions  of  lemma  2.11,  if  l'(R)  is  a constant,  then 

f (R)  = (2.4.20) 


where 


and 


CM  = max  I ( X ; Y ) 
PX 


CMW  = m3X  ^X;Z) 
PX 


Proof: 

r(R)  = max  ( I ( X ; Y ) - I (X ;Z)  ] 

Pxcp(R) 

Since  r(R)  is  a constant,  we  have 
r(R)  = r (CM) 

= I*  (X;Y)  - I * (X;Z) 

PX  PX 


(2.4.21  a) 


(2.4.21  b) 


(2.4.22) 


(2.4.23) 

(2.4.24) 


(2.4.25) 


where  in  (2.4.25)  we  have  used  lemma  2.11  to  obtain  I * (X;Z)  = 

PX 

CMW  ’ 0 

Coro  1 1 ary  2: 

r(R)  is  not  constant  for  the  wiretap  channel  example  of  figure  2.6. 

2.4.3  MON- DEGRADED  WIRETAP  CHANNELS 

Thus  far  we  have  modeled  the  wiretapper's  channel  as  a degraded 
form  of  the  main  channel.  While  this  model  is  very  suggestive  and 
often  arises  in  practical  situations,  there  may  be  situations  in  which 
the  notion  of  degradation  is  not  applicable.  We  now  give  an  example  of 
a very  simple  non-degraded  wiretap  channel  in  which  the  "main  channel" 
can  be  operated  at  capacity  whilst  the  wiretapper  can  be  kept  totally 
ignorant  of  the  intended  message. 

The  example  is  the  orthogonal  broadcast  channel  |4|  shown  in 
figure  2.9.  The  connecting  lines  denote  transition  probabilities  of  1. 
For  simplicity,  let  us  assume  that  the  source  is  binary  symmetric.  We 
can  send  reliably  to  V (the  legitimate  receiver)  at  the  maximum 
possible  rate  of  1 bit  per  channel  use  whilst  keeping  Z (the  wire- 
tapper)  totally  ignorant  of  the  source  output,  by  using  only  the  two 
input  letters  labelled  1 and  3.  If  the  source  output  is  0,  a '1' 
is  transmitted.  Otherwise  a '3'  is  sent.  It  can  easily  be  seen  that 
this  scheme  achieves  a point  (R,d)  = (1,1). 
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Figure  2.9  - NON-DEGRADED  WIRETAP  CHANNEL 

It  may  seem  that  somehow  the  above  scheme  is  not  making  full  use 
of  the  capabilities  of  the  channel.  If  we  consider  Y and  Z to  be 
two  receivers  which  are  to  learn  only  the  messages  specifically  destined 
for  them  and  not  each  other's  message,  we  can  use  the  coding  scheme 
shown  in  Table  2.1.  Si  and  S,>  are  the  source  outputs  of  two  independent 


s 

s 

X 

0 

0 

1 

0 

1 

2 

1 

0 

3 

1 

1 

4 

Table  2.1  - CODING  SCHEME  FOR  CHANNEL  OF  FIGURE  2.9. 


It  can  readily  be  verified  that  this  scheme  allows  reliable  trans- 
mission at  the  maximum  possible  rates  to  both  receivers,  but  each 
receiver  is  kept  totally  ignorant  of  the  other's  message. 

The  privacy  problem  can  be  extended  to  networks.  We  might  mention 
a two-sender,  two-receiver  network  in  which  each  sender  wishes  to  com- 
municate reliably  with  its  intended  receiver  while  keeping  the 
other  receiver  as  ignorant  as  possible  of  the  message.  The  problem  now 
consists  of  four  parameters  of  interest: 

Rj,  i = 1,  2,  the  transmission  rate  to  receiver  i from  sender  i. 

d-j,  i = 1,  2,  the  equivocation  about  the  message  intended  for  the 
other  receiver  at  receiver  i. 
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CHAPTER  3 

THE  WIRETAP  CHANNEL  WITH  FEEDBACK 

3.1  INTRODUCTION 

In  the  previous  chapter,  we  examined  the  Gaussian  wiretap  channel 
which  is  included  in  the  class  of  degraded  wiretap  channels  (i.e.  the 
wiretapper's  data  is  a degraded  version  of  the  legitimate  receiver's 
data).  This  class  of  channels  possesses  a certain  symmetry  between  the 
legitimate  receiver  and  the  wiretapper:  both  are  just  observers  of  the 

data  coming  over  their  respective  channels.  If  the  two  channels  are 
statistically  equivalent,  then  the  legitimate  receiver  is  no  better  off 
than  the  wiretapper  and  no  rate-equivocation  pair  (R,d)  such  that  Rd  0 
can  be  achieved.  Therefore  the  approach  taken  in  chapter  2 (except  for 
section  2.4.3)  is  suited  for  channels  in  which  it  is  known  that  the  wire- 
tapper's channel  is  a degraded  form  of  the  main  channel. 

In  practice  this  would  hopefully  be  the  case.  However,  there  may  be 
situations  in  which  the  wiretapper's  channel  is  as  good  as  or  maybe  even 
better  than  the  main  channel.  In  this  case,  the  only  hope  remaining  to 
confuse  the  wiretapper  is  to  destroy  the  symmetry  mentioned  earlier.  Here 
we  will  consider  one  way  of  achieving  this,  namely  by  allowina  feedback 
from  the  legitimate  receiver. 

The  model  which  we  wish  to  analyze  is  shown  in  figure  3.1.  Unlike 
figure  2.1  where  the  input  to  the  wiretap  channel  is  the  output  of  the 
main  channel,  we  now  allow  the  wiretapper  to  view  the  output  of  the  encoder 
directly  through  his  channel.  The  decoder  at  the  legitimate  receiver  is 
permitted  to  send  back  information  to  the  encoder  throuqh  a noiseless, 
infinite-capacity  feedback  link.  The  wiretapper  is  not  allowed  to  inject 
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Figure  3.1  - MODEL  FOR  WIRETAP  CHANNEL  WITH  FEEDBACK 


data  into  the  feedback  channel  even  though  he  might  wish  to  do  so  in  an 
attempt  to  mislead  or  confuse  the  encoder.  This  would  be  a qood  model  for 
the  practical  situation  in  which  the  wiretapper  does  not  wish  to  make  his 
presence  known. 

If  the  feedback  link  is  private  (i.e.  the  wiretapper  cannot  qain 
access  to  the  data  which  is  fed  back)  a one-time  pad  cryptographic 
system  [18,  31]  could  be  used  to  achieve  R = C^,  d - 1.  That  is,  data 
could  be  sent  to  the  legitimate  receiver  at  the  maximum  possible  rate 
consistent  with  reliable  communication  and  at  the  same  time  the  wiretapper 
could  be  completely  foiled.  To  accomplish  this,  the  decoder  sends  back 
a totally  random  sequence  independent  of  all  else,  which  the  encoder 
exclusive-ors  with  the  data  to  be  transmitted.  The  random  sequence  is 
subtracted  out  at  the  legitimate  receiver  and  so  does  not  affect  the  rate 
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of  reliable  transmission.  But  it  acts  as  a totally  noisy  channel  as  far 
as  the  wiretapper  is  concerned. 

For  the  remainder  of  this  chapter,  we  consider  the  harder  problem  in 
which  the  feedback  link  is  public,  i.e.  any  data  fed  back  from  the  legiti- 
mate  receiver  to  the  encoder  is  available  to  the  wiretapper.  We  now 
focus  our  attention  on  an  interesting  wiretap  channel  with  feedback  and 
derive  inner  and  outer  bounds  on  the  achievable  (R,d)  region. 


3 . 2 THE  BINARY  ERASURE  WIRETAP  CHANNEL 
3.2.1  AN  ACHIEVABLE  RATE-EQUIVOCATION  REGION 

The  model  of  the  binary  erasure  wiretap  channel  is  as  shown  in 
figure  3.1  with  the  main  and  wiretapper's  channels  being  independent 
binary  erasure  channels  of  erasure  rates  ^ and  ^ respectively.  For 
simplicity,  we  shall  assume  that  the  source  is  binary  symmetric. 

We  shall  first  prove  the  following  result  which  establishes  an 
innerbound  on  the  achievable  (R,d)  region. 

Theorem  3. 1_ 

The  set  of  (R,d)  pairs  defined  by 


R < (1-r,)  . CH 

(3.2.1) 

d < 1 

(3.2.2) 

Rd  • ‘ 2^"'  1)2 

(3.2.3) 

is  achievable. 
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We  shall  establish  theorem  3.1  by  showing  that  the  extreme  points 


1 

(R1,d1)  = (l-er  2(l-tl)/(l-  ] 2)) 

(3.2.4) 

and 

(R2,d2)  = (r2(l-  , )2/(l-t  f-2)  , 1) 

(3.2.5) 

are  achievable.  We  can  then  invoke  the  time  sharing  argument  of  lemma  2.1 
to  conclude  that  the  entire  region  defined  in  theorem  3.1  is  achievable. 

The  point  (R-|,d^)  can  be  achieved  using  the  following  scheme.  The 
source  outputs  are  sent  directly  over  the  channels  with  no  encoding.  When- 
ever the  legitimate  receiver  gets  an  erasure,  he  asks  for  a retrans- 
mission until  he  learns  the  bit  being  transmitted.  Thus  his  probability 
of  decoding  error  will  be  zero.  The  probability  that  a bit  sent  over 
the  main  channel  will  result  in  an  erasure  is  Therefore  the  number 

of  channel  uses  required  for  a successful  transmission  has  a geometric 
distribution  with  parameter  1-l,  and  the  expected  number  of  channel  uses 
per  bit  is  l/(l-e^). 

If  we  consider  transmitting  a long  sequence  of  source  outputs  over 
the  main  channel,  the  transmission  rate  will  be  (1-r^)  bits/channel 
use. 

Let  us  now  calculate  the  equivocation  d^  of  the  wiretapper  resulting 
from  this  coding  scheme.  Clearly, 

d^  = Prfwiretapper  misses  a bit!  (3.2.6) 

= Prlmain  channel  output  shows  a non-erasure  bit  before 
wiretapper's  channel  output!  (3.2.7) 

= c2(l-f  i ) + e22  c-jO-r^)  + K\  (I'  l)+* - • 

+ ^ V"1^-^) + •••  0.2.8) 
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(3.2.9) 


= egO-e^/O-E^)  (3.2.10) 

Thus , 

= c2(l-c1)/(l-e1£:2)  (3.2.11 ) 

and  this  proves  that  (R^,d^)  is  achievable. 

To  complete  the  proof  of  theorem  3.1,  we  need  to  show  that  (R^.d^) 

is  also  achievable.  Suppose  we  wish  to  transmit  k source  outputs  reliably 

to  the  legitimate  receiver.  Then  we  are  allowed  k/R^  channel  uses  where 
2 

R9  = c2(l-e-|)  /(l-t,f.2).  But  on  the  average,  the  successful  transmission 

of  1 bit  of  information  to  the  legitimate  receiver  reguires  1/(1 -e ^ ) 

channel  uses.  Therefore  in  k/R^  channel  uses,  we  can  reliably  transmit 

k( l-^i )/R2  = k(1-e,e2)/e2(l-c^ ) bits  of  information.  (Note  that  the 

preceeding  argument  is  not  strictly  rigorous.  For  example,  to  be  precise 

we  should  have  stated  that  as  the  probability  that  we  can  transmit 

k{[(1-Ci )/R2]-6)  bits  in  k/ R^  channel  uses  tends  to  1 for  every  6>0.  We 

omit  these  details  in  order  to  simplify  the  proof.  This  remark  will  also 

apply  to  some  subseguent  arguments  in  this  section). 

The  idea  now  is  to  tag  on  k[  -1]  totally  random  bits  to  the  k 

R2 

source  outputs.  An  invertible  prescrambling  operation  similar  to  that 
of  Heilman  and  Carleial  [20]  is  then  used  to  obtain  the  codeword  (of 
length  k(l-t^)/R2)  to  be  transmitted.  This  prescrambling  operation  will 
be  specified  shortly  and  will  be  used  to  prove  that  the  wiretapper  can 
be  kept  totally  ignorant  of  the  k source  outputs  intended  for  the  leqiti- 


Pr  (wiretapper  misses  a bit}  = f_ 2 ( 1 - c ^ )/ ( 1 -t:, r^) . (3.2.12) 

So  the  number  of  bits  in  the  codeword  which  the  wiretapper  misses  is 
[k(l-e1)/R2]  £2(1-c1)/(1-c1c2) 

= k . (3.2.13) 

We  now  prove  the  following  theorem  which  will  establish  the  achievabi 1 i ty 

of  the  point  (R2,  d2) . 

Theorem  3.2 

k n-k 

Let  s^  and  r be  independent,  totally  random  binary  column  vectors. 
Let 


where  A is  chosen  randomly  and  uniformly  from  the  set  of  n x n invertible 
binary  matrices.  Suppose  that  z is  equal  to  x except  for  k erasures. 
Then,  over  this  ensemble  of  codes 

Pr  {A  : H(S|Z)2  k(l-A)}  = l-6(n)  (3.2.15) 

for  all  A > 0.  The  symbol  S(n)  stands  for  some  quantity  which  tends  to 
0 as  n ■*  «>  . 

Proof: 

Using  H(A)  + H ( B | A ) = H(B)  + H ( A | B ) we  find  that 

H(S | Z)  = H(S)  + H(Z|S)  -H(Z)  (3.2.16) 

= H(S)  + [H(X | S)  + H(Z|X,S)  -H(X | Z,S) ] -H(Z)  (3.2.17) 

Since  A is  an  invertible  matrix  we  have 

H(X | S)  = n-k  (3.2.18) 

Also  since  x completely  determines  s, 

H(Z|X,S)  = HUIJ^)  (3.2.19) 
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Therefore, 

H(Z)  -H(Z|X,S)  = I (X ;Z) 

= n-k 

Using  (3.2.18)  and  (3.2.21)  in  (3.2.17)  yields 
H(SjZ)  = H(S)  -H(X|Z,S) 

The  proof  of  theorem  3.2  is  completed  by  applying  lemma  3.1. 
Lemma  3.1 


Pr  (A  : H( X | Z ,S)  s £}  > 1-2' 


Proof: 


From  (3.2.14)  we  have 


= A'1  x = B x 


has  the  same  distribution  as  A. 


(3.2.20) 

(3.2.21) 

(3.2.22) 


(3.2.23) 


(3.2.24) 


With  no  loss  of  generality,  let  us  assume  that  the  last  k bits  of 
z = (z^,  z^,  ...,  zn)  are  erasures,  i.e.. 


Z1  = X1 
Z2  = X2 


(3.2.25) 


z , = x . 
n-k  n-k 


We  know  that 


S1  = -1  - 

T 

s2  = b2  x 


(3.2.26) 


Sk  ‘ - 


where  b^  is  chosen  uniformly  from  all  binary  n-vectors  not  in  the  span  of 
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lb-|,  b^,  b ^ ^ , and  b-j  is  chosen  uniformly  from  all  non-zero  n- 

vectors.  Now 


H(Xj_Z,S)  = n - rank  M 


where 


n 


M1  = 


‘10.. 
0 1.. 

0 0 . . 1 

*,T 


. 0‘ 
. 0 


n-k 


(3.2.27) 


(3.2.28) 


Let  us  define  a matrix  as 


n 


m = 

2 


1 0 . . . 0 

0 1 . . . 0 

6 0 ••  1 ..  0 

-‘,T 


n-  k 

I 


(3.2.29) 


L 

where  c^T  , l<i<k,  is  chosen  uniformly  from  all  binary  n-vectors.  It 
is  intuitively  clear  that 

Pr  (rank  M] > j}  > Pr  frank  > j } . (3.2.30) 

A rigorous  proof  is  given  in  lemma  3.2.  The  probability  that  rank 
n-£  for  0<£^k-l  is  lower  bounded  by 


2n  ^n-k 


>-k, 


2n  _ ^n-k+l 


(1  - 2"K)  (1  - 2'k+1)  ...  (1  - 2'*_l) 


2n 


-£-l 


(3.2.31) 

(3.2.32) 
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(3.2.33) 

(3.2.34) 


(1  - 2-"1)  (1  - 2'"2)  . . . 


where  in  (3.2.34)  we  have  used  the  fact  that 


n n 

t (1  - x.)  > 1 - 5.  a.  , a.  > 0.  (3.2.35) 

i=l  1 i=l  1 1 


From  (3.2.30)  we  conclude  that 

Pr  1 rank  M,  > n-U  > 1 - 2'?'  (3.2.36) 

and  usinq  (3.2.27)  we  obtain  the  desired  result,  i.e., 

Pr  fA  : H(X|Z,S)  <£}>!-  2_?.  (3.2.37) 


Lemma_3. 2 

Pr  frank  M,  > j } ^ Pr  {rank  > j]  (3.2.38) 

I C 


Proof: 

Let  us  choose  the  { b ^ } and  {c^}  as  follows: 

We  first  choose  c^  U {0,  1 }n  i.e.,  according  to  a uniform  distribution 
on  {0,  1 }n.  If  c^  f 0,  then  = c ^ . Otherwise,  we  choose  again  until 
we  obtain  a non-zero  b-j . In  general,  we  choose  c^  U {0,  l}n  and  if 
c • is  not  in  the  span  of  f b^ , ...,  b^  -j } , then  b^  = c . . Otherwise,  we 
draw  again  until  we  get  a b^  i span  { b ^ , ...,  Thus  we  conclude 

that  the  span  of  {b, , ...,  bk ( is  at  least  as  large  as  the  span  of 
•cr  ...,  c^}.  □ 
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3.2.2  DISCUSSION  OF  ACHIEVABLE  REGION 


In  this  section  we  shall  discuss  some  of  the  implications  of  theorem 
3.1.  We  examine  three  separate  cases. 

Case  I:  > ^ = ? = ■ . 

In  this  case,  theorem  3.1  states  that  the  region  defined  by  R<1-  , 

d<l  and  Rd  < (1-  )/(l  + ) can  be  achieved.  If  r = 0.5  and  d = 1,  then 
a rate  of  0.5/3  is  achievable.  This  means  that  even  though  the  wiretapper's 
channel  is  as  good  as  that  of  the  legitimate  receiver,  feedback  allows 
totally  secure  transmission  up  to  a rate  at  least  equal  to  a third  of  the 
main  channel  capacity.  A plot  of  (1-e)/(1+  ) as  a function  of  > is 
sketched  in  fiqure  3.2. 

Case  II:  c,  = 0,  = r . 

Here  the  region  which  can  be  achieved  is  given  by  R < 1 , d < 1 , Rd < . 
This  is  the  situation  in  which  the  main  channel  is  noiseless  and  from 
Appendix  1 1 , we  know  that  when  no  feedback  is  allowed,  the  set  of  all 
achievable  (R,d)  pairs  is  identical  to  the  region  defined  above.  This 
is  really  not  very  surprising  as  will  be  shown  in  the  next  section. 

Case  III:  = 0.5,  e„  = r.  . 

The  achievable  region  is  now  lowerbounded  by  R ^ 0.5,  d < 1 , Rd  ^ 

■ / (4-2- ) . Figure  3.3  shows  a plot  of  >/(4-20  as  a function  of 
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We  might  also  point  out  that  as  long  as  ^ 1 and  ^ is 

possible  to  transmit  at  a positive  rate  in  perfect  secrecy. 


3.2.3.  AN  OUTERBOUND  ON  THE  ACHIEVABLE  RATE-EQUIVOCATION  REGION 

We  shall  develop  an  outerbound  for  the  binary  erasure  wiretap  channel 
with  feedback.  The  generalization  of  this  outerbound  to  channels  with 
D-ary  input  and  E-ary  output  alphabets  is  easy. 

Theorem  3.3 

The  set  IR^  of  all  (R,  d)  pairs  which  can  be  achieved  on  the  binary 
erasure  channel  with  feedback  is  contained  in  the  set  of  all  achievable 
(R,  d)  pairs  when  the  main  channel  is  made  noiseless  and  no  feedback  is 
al lowed. 

Remark:  In  general,  the  main  channel  is  made  into  a noiseless  channel 
with  M-ary  input  and  output  alphabets  where  M = min  :D,  El  . 

Proof:  follows  from  lemmas  3.3  and  3.4. 


Lemma  3.3 

The  set  31 1 defined  in  theorem  3.3  expands  if  the  main  channel  is 
replaced  by  a noiseless  binary  channel. 

Proof: 

The  original  binary  erasure  wiretap  channel  can  be  recovered  by 
cascading  a binary  erasure  channel  of  erasure  rate  c^  with  the  noiseless 
channel . □ 
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Lemma  3.4 


If  the  main  channel  is  noiseless,  feedback  cannot  increase  the 
achievable  (R,  d)  region. 

Proof: 

Since  the  main  channel  is  noiseless,  the  encoder  knows  exactly  the 
data  received  by  the  legitimate  receiver.  Therefore  the  feedback  inform- 
ation should  not  depend  on  this  data  since  otherwise  we  would  be  needlessly 
allowing  the  wiretapper  to  gain  extra  information.  Any  "feedback  informa- 
tion" could  hence  have  been  conveyed  to  both  legitimate  receiver  and 
wiretapper  prior  to  using  the  channel,  and  can  be  considered  to  be  part 
of  the  code.  The  feedback  link  is  thus  unnecessary.  □ 

Remark : 

This  lemma  explains  the  results  of  case  II  in  section  3.2.2. 

When  the  main  channel  is  "not  very  noisy",  theorem  3.3  will  yield 
a fairly  tight  outerbound . Unfortunately,  for  "noisy"  main  channels,  we 
expect  this  bound  to  be  very  weak.  The  proof  of  a tight  outerbound 
remains  an  open  problem. 
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CHAPTER  4 


FEEDBACK  I N MULTI PLE-ACCESS  CHANNELS 

4 . I INTRODUCTION 

This  chapter  is  concerned  with  the  use  of  noiseless  feedback  to 
improve  the  performance  of  communication  systems.  We  begin  with  a brief 
review  of  some  results  concerning  single-input  single-output  channels 
(see  figure  1.1).  In  certain  communication  problems,  a noiseless  feed- 
back link  is  available  and  may  be  used  to  improve  communication  over  a 
noisy  forward  link.  An  example  is  communication  with  a satellite:  the 

power  in  the  ground-to-satel 1 i te  direction  is  usually  so  much  larger 
than  in  the  reverse  direction  that  the  first  link  can  be  considered  to 
be  essentially  noiseless. 

One  would  expect  that  the  use  of  the  feedback  link  should  somehow 
facilitate  the  task  of  sending  information  from  the  source  to  the 
destination.  Therefore,  Shannon's  result  [25,  theorem  6]  that  the 
capacity  of  a menioryl ess  forward  channel  is  not  increased  by  noiseless 
feedback  is  quite  surprising.  However,  it  is  possible  to  take  advantage 
of  feedback  to  reduce  decoding  delay  and  system  complexity  [26]. 

More  recently,  Gaarder  and  Wolf  [27]  showed  by  an  example  that  it 
is  possible  to  increase  the  capacity  region  of  a discrete  memoryless 
multiple-access  channel  through  the  use  of  feedback.  The  example  which 
they  used  is  the  noiseless  binary  erasure  multiple-access  channel  shown 
in  figure  4.1. 
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Figure  4.1  - NOISELESS  BINARY  ERASURE  MULTIPLE-ACCESS  CHANNEL 
The  transition  probabilities  for  the  channel  are  indicated  in  table  4.1 


Y 


X1X2 

0 

1 

e 

0 0 

1 

0 

0 

0 1 

0 

0 

I 

I 0 

0 

0 

1 

1 1 

0 

1 

0 

Table  4.1  - TRANSITION  PROBABILITIES  TOR  NOISELESS 
BINARY  ERASURE  MULTIPLE-ACCESS  CHANNEL 


Using  the  results  of  section  1.4  (specifically  (1.4.2))  we  find  that  the 
capacity  region  for  the  above  channel  without  feedback  is 

C-  ={(R],  R2)|0  < R]  < 1,  0 < R2  < 1,  0 <R1  + R2<  1.5>  (4.1.1) 

where  R. , i=l,  2 is  the  transmission  rate  from  the  ith  transmitter. 

Gaarder  and  Wolf  [27]  show  that  the  rate  pair  (R^,  R^)  = (0.76,  0.76), 
which  falls  outside  the  region  described  in  (4.1.1),  is  achievable. 

Here  we  look  at  a feedback  scheme  for  enlarging  the  capacity  region 
of  general  multiple-access  channels  [28],  The  scheme  consists  of  two 
stages.  During  the  first  stage  (stage  1),  the  two  transmitters  send 
information  reliably  to  each  other  at  the  maximum  possible  rate.  Stage 
1 ends  when  each  transmitter  has  complete  knowledge  of  the  other's  message. 
Since  each  transmitter  knows  what  it  sent  and  sees  the  received  data 
through  tne  feedback  link,  this  occurr  before  the  receiver  gains  complete 
knowledge  of  the  messages.  That  is,  the  rate  of  transmission  in  stage  1 
will  be  too  high  for  reliable  transmission  to  the  receiver.  However, 
the  stage  1 transmissions  will  enable  the  receiver  to  narrow  down  the 
set  of  possible  transmitted  messages  to  a considerably  smaller  set  of 
"typical"  messages.  With  probability  arbitrarily  close  to  1,  the  set  of 
"typical"  messages  will  contain  the  actual  transmitted  message.  The 
receiver  and  the  two  transmitters  then  arrange  the  messages  in  the 
"typical"  set  in  some  prearranged  ordering.  This  sets  up  stage  2 
during  which  the  two  transmitters  totally  cooperate  to  send  the  index 
of  the  actual  transmitted  message. 

In  retrospect,  it  is  interesting  to  note  that  the  scheme  used  by 
Gaarder  and  Wolf  is  a special  case  of  the  scheme  proposed  here. 
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Certain  results  and  notations  concerning  multiple-access  channels 
can  be  found  in  section  1.4  and  will  not  be  repeated  here.  We  will  show 
that  the  proposed  feedback  scheme  increases  the  achievable  rate  region 
for  the  additive  white  Gaussian  noise  (AWGN)  multiple-access  channel. 

In  section  4.4,  we  make  a digression  on  typical  sequences  and  recall 
some  basic  results  which  will  be  needed  in  section  4.5,  where  we  consider 
discrete  memoryless  multiple-access  channels.  But  first  we  introduce 
the  model  of  the  multiple-access  channel  with  feedback  and  some 
additional  notation. 

4 . 2 FEEDBACK  CHANNEL 

The  memoryless  multiple-access  channel  with  feedback  is  depicted  in 
figure  4.2  below. 


: 


Figure  4.2 


MULTIPLE-ACCESS  CHANNEL  WITH  FEEDBACK 
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The  ith  symbol  x^.  sent  by  transmitter  1 depends  upon  the  message 
index  k that  transmitter  1 wishes  to  send  and  upon  the  previous  receiver 
symbols  y-j  ,y2>  • • • >y..-_-|  • A similar  statement  holds  for  the  ith  symbol 
x^  sent  by  the  second  transmitter. 

In  the  remainder  of  this  chapter,  the  following  notation  will  be 

used: 

R -| 2 = transmission  rate  from  the  first  to  the  second  transmitter 
during  stage  1 . 

R^l  = transmission  rate  from  the  second  to  the  first  transmitter 
during  stage  1 . 

R1  = overall  (considering  both  stages)  rate  from  the  first 
transmitter  to  the  receiver. 

= overall  rate  from  the  second  transmitter  to  the  receiver. 
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4.3  the;  awgn  multiple-access  channel 


For  illustration,  we  will  analyze  in  detail  the  symmetric  case  in 
which  the  average  power  constraints  on  both  inputs  are  the  same,  namely 
= P p = P.  We  will  then  show  how  to  generalize  the  scheme  to  the 
asymmetric  case. 

The  basic  idea  is  for  each  transmitter  to  send  at  full  power  to  the 
receiver,  but  at  rates  corresponding  to  the  capacity  of  the  channel  when 
the  transmitter's  own  signal  is  subtracted  out.  Thus  each  transmitter 
very  quickly  learns  what  the  other  transmitter  is  sending.  However, 
the  receiver  Y is  still  confused  because  the  total  rate  has  exceeded 
his  capacity.  In  stage  2,  the  transmitters  now  use  coherent  transmission 
to  send  the  missing  bits  in  the  receiver's  knowledge  of  the  intended 
messages . 


Figure  4.3  - AWGN  MULTIPLE-ACCESS  CHANNEL  WITH  FEEDBACK 
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The  model  for  the  AWGN  multiple-access  with  feedback  is  shown  in 

figure  4.3.  Note  that  we  are  denotinq  the  noise  variance  by  N instead  of 
2 

a in  order  to  simplify  notation.  Our  proof  that  feedback  can  enlarge 

the  achievable  rate  region  makes  use  of  the  usual  random  coding  argument. 

Consider  the  ensemble  of  randomly  generated  codes  of  blocklenqth  n and 

rates  = R^  = R*  obtained  as  follows.  Each  codeword  is  an  n- 

sequence  of  independent  outcomes  of  a zero-mean  Gaussian  random  variable 

with  variance  P'  = P-n  where  n > 0 is  a quantity  which  will  be  made  to 

nR.  „ 

tend  to  zero.  We  generate  2 independent  such  codewords.  This  will 
be  the  code  used  by  the  first  transmitter  during  stage  1.  Similarly 
the  code  used  by  the  second  transmitter  during  staqe  1 is  obtained  by 

nRpi 

generating  2 independent  codewords  as  above. 

nR* 

Let  us  suppose  that  we  wish  to  send  one  of  2 independent  equipro- 

bable  messages  to  Y from  each  source.  Because  of  the  symmetry  induced 

by  the  random  coding,  we  can  assume  the  messages  actually  transmitted  to 

be  (1,  1).  From  the  results  on  coding  for  single-input  sinqle-output 

AWGN  channels,  we  know  that  at  the  end  of  staqe  1,  x^  can  be  quessed  at 

the  second  transmitter  with  arbitrarily  small  probability  of  error,  say 

P < e/5  , e > 0 if 
1.2 

R^2  5 2 0 + jp)  bits/transmission  (4.3.1) 

and  the  blocklenqth  n is  sufficiently  large.  Similarly,  x^  can  be  es- 
timated at  the  first  transmitter  with  arbitrarily  small  probability  of 
error,  say  P < e/5  , e > 0 if 

eu 

R21  - \ 1og  (]  + lr)  bits/transmission  (4.3.2) 
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and  n is  sufficiently  large.  In  particular,  we  can  set  R,^  = = R*  = 

2 log  ^1  + . We  note  that  stage  1 requires  n transmissions.  Let 

us  denote  the  n-sequence  received  at  Y during  stage  1 by  y.  In  the 

2nR* 

following,  we  shall  be  concerned  with  the  set  of  2 codewords  ob- 
tained by  taking  all  possible  sums  x = . By  the  independence  of 

Xj  and  x^,  each  component  of  x is  a zero-mean  Gaussian  r.v.  with 
variance  2P‘  . We  shall  say  that  a codeword  x is  linked  to  y if  x lies 
within  the  n-sphere  of  radius  \/n(N+-  ) centered  at  y.  By  the  law 
of  large  numbers,  we  know  that  with  probability  P = 1 - c/5  , c > 0 
the  correct  codeword  will  be  linked  to  y for  n sufficiently  large. 

We  now  proceed  to  consider  the  set  Sy  of  codewords  which,  at  the 
end  of  stage  1,  are  linked  to  y.  Let  !Sy  denote  the  cardinality  of 

S . Then  the  averaqe  IS  I , taken  over  the  ensemble  of  random  codes 
y y 

and  possible  input  messages  is 

E |Sy|  = 2L]  + 4 + Pc  (4.3.3) 

(see  (4.5.16)  for  a full  explanation  in  a general  context) » where 
L-|  = expected  number  of  codewords  linked  to  the  received  sequence  when 

nR* 

a code  with  (2  -1)  independent  codewords  whose  n components  are 

generated  independently  according  to  a zero-mean  Gaussian  r.v. 
with  variance  P'  is  used  over  an  AWGN  cahnnel  with  noise  variance  N. 

Ly  = expected  number  of  codewords  linked  to  the  received  sequence  when 

nR*  ^ 

a code  with  (2  -1)  independent  codewords  whose  n components  are  in- 

dependent zero-mean  Gaussian  r.v.  with  variance  2P1  is  used  over  the 
same  channel . 

From  Shannon  [29]  we  know  that  for  sufficiently  large  n,  V 0, 
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(4.3.4) 


L]  < ( 2nR*-l ) 2 


L2  < (2nR*-l)2  2 


-n  [1  log  (1  + jjj-)  - eJ 
p . / , 2P 1 v ] 

,-"[2  109  (1  + 1T}  - £J 


(4.3.5) 


where  we  have  used  the  union  bound  in  (4.3.4)  and  (4.3.5).  So,  Y-  e>  0, 

n , , , , P ' \ n •,  n , ?P\ 

2L,  * 4 < 2nR*  (2  • 2'  2 9 N + 2nR*.  2 ' 2 9 N ) 2nc 


= (2  + 2 


t n['°9  0 + r>  - 2 ’°9  (1  + TP 


(4.3.6) 


2n£  (4.3.7) 


P'  1 Op ' p' 

log  (1  + jj-)  > \ 1 og  (1  + -jj-)  if  > 0 . 
Therefore  for  n sufficiently  large  and  e'  > e , 


(4.3.8) 


L s 2"  [l09  (1  + r>  - 2 109 


21,  4 4 


Using  Markov's  inequality,  we  obtain 


(1+^)+E']  A k. 


(4.3.9) 


A ne,  -ne, 

Pfa  = Pr  | Sy | > K2  ^ 2 1 , e]  > 0 


< e/5,  V e > 0 as  n + «>  . 


(4.3.10) 


Note  by  inspection  of  (4.3.9)  that  in  stage  1 the  uncertainty  of 
the  receiver  is  reduced  by  7}  log  (l  + \ bits,  precisely  that  which 

could  be  obtained  (without  feedback)  if  1 and  2 were  actually  trying 
to  communicate  with  Y rather  than  with  each  other. 
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Let  Pp  = Pr  | ' x.|  (1 ) 


where 


! X.(1)  !! 


j=l 


^ nP  or  x^(  1 ) ^ 


xf.(l)  , i - 1,  2. 
lj' 


nPj-  (4.3.11) 
(4.3.12) 


1 2 

Now,  n xi.j(l)  is  the  arithmetic  average  of  n independent  identi- 

cally distributed  random  variables  with  expected  value  E (x^(l))  = P'  P. 

By  the  law  of  large  numbers,  we  know  that  Pr  | ' x.(l)  ^ nP,  i = 1,  2* 
can  be  made  arbitrarily  small.  By  the  union  bound,  we  can  let  Pp 
Define  E^  as  the  event  that  stage  1 is  successful,  i.e., 

(1)  x, ( 1 ) ’ 2 < nP  and  x^l)!  ^ < nP  so  that  the  codes  satisfy  the 
power  constraints. 

(2)  x.|  and  x^  are  correctly  decoded  at  transmitters  2 and  1 respectively. 

(3)  the  correct  codeword  x(l)  is  linked  to  the  received  sequence  y. 

no, 

(4)  SJ  K2 


1 


Let  E^  denote  the  complement  of  this  event.  Then,  by  the  union  bound, 


Pr  {E?}  < P + P + (1-P  ) + P + Pc  < e (4.3.13) 

l e1?2  e^1  c b l- 

where  Pr  • denotes  expectation  over  the  choice  of  codebooks  and  possible 
input  messages. 

Therefore,  there  exists  at  least  one  code  which  satisfied  the  power 
constraints  and  has  an  average  probability  of  failure  in  stage  1 less 
than  f . Since  we  can  choose  n to  be  arbitrarily  small,  the  limiting 
value  of  R*  is  l log  (1  + J^-)  . 

Assuming  the  first  stage  is  successful,  in  stage  2 we  have  to  send 
P 1 ?D 

at  most  n[log  (1  + ^)  - >,  log  (1  + jjp)  + <]  bits  to  the  receiver  in 
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order  to  specify  completely  the  correct  codeword.  But  in  stage  2,  the 
two  transmitters  can  cooperate  totally  since  they  both  know  the  correct 
codeword.  They  both  send  the  same  signal.  Thus,  the  signals  add  co- 
herently at  the  receiver  for  stage  2.  Because  of  the  additive  nature  of 
the  channel,  total  cooperation  between  the  two  senders  allows  reliable 

transmission  (i.e.  with  a probability  of  error  P < e)  up  to  a rate  of 

e? 

1 4P  L 

2 log  (1  + -jjj-)  bits/transmission  i.e.,  the  effective  power  is  now 


( x/p”  + v/p  = 4P  instead  of  P + P = 2P. 

We  define  the  probability  of  failure  Pf  as  the  probability  that  at 

the  end  of  stage  2,  the  receiver  does  not  correctly  estimate  the  messages 

sent  from  the  2 transmitters.  Thus  P,.  ^ Pr{E,}  + P < 2 e , which 

f 1 e^ 

means  the  proposed  scheme  allows  reliable  transmission. 

We  now  calculate  the  effective  achievable  rate.  The  number  of  trans- 
missions required  for  stage  2 is 


n [log  (1  + £)  - ^ log  (1  + ^)  + e] 


1 log  (1  + 


(4.3.14) 


We  recall  that  stage  1 requires  n transmissions,  and  that  the  total 
amount  of  information  conveyed  from  the  two  senders  to  the  receiver  in 

p 

both  stages  is  n log  (1  + bits.  So  the  overall  effective  rate 
R-|  + Rp  is  (as  e -+  0) 


n log  (1  + y) 

n + n [log  (1  + y)  - j log(l  + 2y)]/[]-  log  (1  + 4y)] 

(4.3.15) 
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— —aw 


log  (1  + i)  log  (1  +4,) 

log  (1  + 4-f)  + 2 log  (!  + »•)-  log  (1  + 2^) 


bits/ 

transmission 


(4.3.16) 


where  y = P/N 


Figure  4.4  shows  the  point,  * , which  can  be  achieved  by  the  proposed 
scheme  for  » = 5. 


\ TOTAL  CO-OPERATION  LINE 


V 


0.896 


(0.645,0.645) 


0.304  - CAPACITY 

REGION  WITH 
NO  FEEDBACK 


0.304  0.896 


R. (not*) 


Figure  4.4  - CAPACITY  REGION  OF  AWGN  MULTIPLE-ACCESS  CHANNEL  WITH  y=5. 

RATES  ARE  IN  NATS/TRANSMISSION. 


In  the  asymmetric  case,  suppose  P^  ?2-  We  would  like  the  two 
transmitters  to  learn  each  other's  message  after  an  equal  number  of  trans- 
missions so  that  they  are  ready  to  cooperate  at  the  same  time.  So  we 
choose  to  transmit  mRj  and  mR,,  bits  to  the  receiver  from  transmitters 
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1 P]  1 

1 and  2 respectively  where  R]  = ^ log  (1  + -j^-)  and  $-2  ~ 2 ^ 

Po 

(1  + -j^-)  and  m is  some  large  number.  It  is  not  difficult  to  see  that 
this  scheme  will  yield  a point  outside  the  capacity  region  with  no  feed- 
back. 


4 . 4 JOINTLY  TYPICAL  SEQUENCES 

In  this  section  we  recall  some  basic  results  concerning  typical 
sequences  which  will  be  used  to  find  out  when  feedback  can  increase  the 
capacity  region  of  discrete  memoryless  multiple-access  channels. 

Let  {X^,X^,  ...,X<k>}  denote  a finite  collection  of  discrete 
random  variables  with  some  fixed  joint  distribution  p(x^,x^,  ...,x^). 
Let  S denote  an  ordered  subset  of  these  random  variables  and  consider 
n independent  copies  of  S.  Thus, 


Pr{S  = s}  = n Pr{S-j  = Si } 
i=l 


(4.4.1) 


For  example,  if  S = (x^,X^)  , then 
PrfS  = si  = Pr  |(  X(j\x^  ) = 

= n p ( x{J>.x{k>  ) 


= ( x(j),x(k)  ) j (4.4.2) 


Definition:  The  set  At  of  jointly  e-typical  n-sequences 


( x^  ^ ,x^  , . . . ,x^  j is  defined  by 
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A.  (*(,).*(2) X(k>)  ■|(»<,),x(2) x<k>)  . (,<’>)"* 

(l(2)  )"  X...X  (l(k)  )":  - J-  log  p(s)-H(S) 

i 

■_  t , V S c { X ( 1 ) , X ( 2 ^ X ( k } J 1 (4.4.3) 


where  s denotes  the  ordered  set  of  sequences  in  |x^,...,x^k^  j 

corresponding  to  S.  Let  A (S)  denote  the  restriction  of  A to  the 

► e 

coordinates  corresponding  to  S . Thus,  for  example,  for  S = (X^,X^), 

A ( X(1),X(3)  ) = { ( x(1),x(3)  ) : ! - l log  p ( x(1),x(3)  ) - H (x(1),  X(3))|  < 


- J log  p ( x(1 1 ) - H ( X(  1 } ) 


n lo9  P ( 


U(3))  } 


(4.4.4) 


We  now  recall  three  basic  lemmas.  For  a proof  of  these  lemmas,  see 
Forney  | 30 | and  Cover  j 1 0 | . 


Lemma  4.1:  For  any  0,  there  exists,  an  integer  n such  that  A (S) 

satisfies,  for  all  S _ X'k^  , 


Prl A (S)j  1 - e , 


(ii)  s t A (S)  > - i log  p(s)  - M ( S ) i 

(iii)  (l-> )2n(H(5)*  } | A (S) | • 2n(H(S)  +t 


(4.4.5) 


1 


j^'1'  , I IIVIIIKI 


Lemma  4.2:  Let  the  discrete  random  variables  X,Y  have  joint  distribution 

p(x,y)  . Let  X1  and  Y'  be  independent  but  with  the  same  marginals 

p(x)  = \ p(x,y) 

y 

p(y)  = ^ p(x,y) 
x 

as  X and  Y . 

n n 

Let  (X,Y)  ~ n p(x.,y.)  and  (X* ,Y* ) ~ n p(x.)p(y.)  where  X.  and  Y., 
i = 1 i=i  1 1 1 i 

1 _ i < n are  independent  and  identically  distributed  as  X and  Y 
respectively.  Then 

Pr { ( X 1 , Y ' ) e A (X,Y)}  < 2'n|I(X;Y)  ' e]  (4.4.6) 


Lemma  4.3:  Let  the  discrete  random  variables  X,Y,Z  have  joint 

distribution  p(x,y,z)  . Let  X ’ , Y ’ be  conditional ly  independent  given 
Z , with  the  marginals 


n 

Let  (X,Y,Z)  ~ n p(x.,y.,z.)  and 
i=l  111 

(X'.Y'.Z)  - n p(x. |z.)p(y. [z-)p(z-)  . 

- * - i=i  11  11  1 

Then  Prl ( X' ,Y ' ,Z)  e A (X,Y,Z)  I _ 2'n| I(X;Y'Z^  " ' 1 (4.4.8) 

4 . 5 THE  DISCRETE  MEMORYLESS  MULTIPLE-ACCESS  CHANNEL 

In  this  section,  we  shall  find  conditions  under  which  the  proposed 
scheme  will  increase  the  capacity  region  of  discrete  memoryless  multiple- 
access  channels.  In  particular  we  shall  prove  the  following. 

Theorem  4.1 

Let  and  Q2  denote  probability  distributions  on  X^  and  X^ 

★ ★ 

respectively.  Suppose  that  Q*  = (Q^.Q^)  achieves  the  maximum  of 
I(X1  ,X2;Y)  over  all  product  distributions  Q-|  (x.|  )Q2(x2)  . Then  the 
proposed  feedback  scheme  allows  reliable  transmission  at  an  overall  ef- 
fective rate  of 

Watr:  <4-5J> 


where 


A = IQ*(X1  ;Y|X2)  + IQ*(X2^  X,) 


(4.5.2) 


B = max  I(X-|  ,X2;Y) 

P ( x-j , *2 ) 


(4.5.3) 
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and 


C = IQ*(XrX2;Y) 


is  the  maximum  non-feedback 
sum  rate.  (4.5.4) 


Remark:  The  maximum  in  the  expression  for  B is  taken  over  all 

joint  distributions  of  X-j  and  X^. 

Coro! 1 ary:  Sufficient  conditions  under  which  feedback  will  increase 

the  capacity  region  are  A,B  > C,  i.e., 

(i)  Iq*(Xi;Y|X2)  + Iq*(X2;Y(X1 ) > Iq*(X1 .X2;Y)  (4.5.5) 


(ii)  max  I ( X-, , X2 ; Y ) > Iq*(X]  ,X2;Y) 
P(x]  ,x2) 


(4.5.6) 


Remark:  Notice  that  (ii)  is  a necessary  condition  for  feedback  to 

increase  the  capacity  region. 

Proof  of  Corollary:  We  want  to  show  that  if  A > C and  B > C,  then 


AB 


A+B-C 


> C 


(4.5.7) 


Let  A = C + a and  B = C + 6 where  a, 6 > 0 . 


(4.5.8) 


Then 


i.e. 


C2  + aC  + Cp  + ap  > C2  + aC  + PC 


(C  + a)  (C  + P)  c 
C + a + B 


(4.5.9) 

(4.5.10) 


l .e. 


AB 


A+B-C 
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> C 


I 


We  now  proceed  to  prove  theorem  4.1. 


Consider  a stage  1 randomly  generated  code  of  blocklength  n and 
rates  R-j 2 ’ ^2 1 Stained  as  follows.  Each  codeword  x.,  i = 1 ,2  is  an 
n-sequence  of  independent  outcomes  of  a random  variable  distributed  ac- 

^ nR-j  2 nR^i 

cording  to  Q,  . We  generate  2 independent  x,'s  and  2 

r nR12l 

independent  x^'s.  Let  these  be  indexed  as  x^(k),  k e jj  ,2  [ and 


x2(0  , l e 1,2 


Let  K,L  be  independent  random  variables  drawn  according  to  uniform 


distributions  on  1 1 ,2 


and  1 ,2 


respectively.  Let  the 


code  be  chosen  randomly  from  the  above  ensemble.  As  in  section  4.3,  we 
first  proceed  to  upper  bound  the  average  probability  Pr<E^i  that 
stage  1 is  unsuccessful.  By  the  symmetry  induced  by  the  random  coding, 
we  see  that  each  transmitted  message  ( k , 2. ) yields  the  same  probability 
of  error.  So  hence  forth  we  shall  assume  that  the  actual  transmitted 
message  is  (1,1)  . 

The  decoding  rule  for  estimating  the  message  of  the  first  transmitter 
at  the  second  transmitter  (at  the  end  of  stage  1)  is  as  follows.  If  y 
is  received,  declare  k = k was  sent  if  and  only  if  there  is  only  one 


k c j 1,2  j such  that  (^ (k) ,x?(l ) ,y)  are  jointly  typical.  Using 

lemmas  4.1  and  4.3,  it  can  be  shown  (for  details  see  [10])  that  k can 
be  decoded  with  arbitrarily  small  probability  of  error,  say, 

Pe  < V4  if 
el  ,2  4 

R1 2 < IQ*  (X1 ;Y|X2)  (4.5.11) 
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and  n is  sufficiently  large.  Similarly  i can  be  estimated  at  the 
first  transmitter  with  arbitrarily  small  probability  of  error,  say  , 


■1,1 


V a if 


R21  - IQ*  (X2;Y|X1 ) 


(4.5.12) 


We  now  consider  the  set  S of  codewords  which,  at  the  end  of  stage  1, 

•Z. 

are  jointly  typical  with  y . From  lemma  4.1,  we  know  that  (x^ (1 ) ,x2 ( 1 ) ) 
will  be  jointly  typical  with  y with  high  probability,  say. 


Pc  > 1 - U 


(4.5.13) 


Let 


'k,.(*>  - 


1 , (x1(k),x2(f.),y)  are  typical 


0 , otherwise 


(4.5.14) 


Then  I S [ = E \ Ay) 
i k,i 


(4.5.15) 


E|S  I = E E*.  (y)  + E E*k  (y) 

* k=l  ,z=l  K,t-  k?l  ,£=1 


E (y)  + E E*.  (y)  (4.5.16) 

k=l  ,1/1  K’z  ‘ Ml, Ml  K’* 


Using  lemmas  4.2  and  4.3,  it  can  be  shown  that 


-n  I In*(X, ,X?;Y)  - e 

E*ki4(y)<2  L4 


, Ml,  Ml 


(4.5.17) 
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-n  ! In*(X9;Y|X1)  - e] 

2 10  2 1 Ji  w> 


(4.5.18) 


E-k^<y)  < 2 L 


-n  | IQ*(X1 ;Y| X2)  - e 


I , Ml , p-1 


(4.5.19) 


Therefore 


ElS^I  _ 1 * (/V,)  (2"V,)  2-"  [V'W'I  - 
* (2"V,)  2'"  [ IQ*(X1;,|X2)  - '] 

+ (2"V,  ) 2‘n  [W^V  - <« 


(4.5.20) 


Taking  the  limit  in  (4.5.11)  and  (4.5.12)  we  can  set  R]2  = 
Ig*(Xj;Y  X2)  and  R^j  = Ig* (X^ ; Y | Xj ) . Then  (4.5.20)  becomes 


E j S i 1 + 2- 2nt  + 2P  • IQ*(X1;Y:X2)  + !Q*(X2 ;Y i X1 ) ~ IQ*(X1’X2;Y)  + L ] 
* (4.5.21) 


n(D  + t-. ) 

> 


(4.5.22) 


where  D - IQ*(X1  ; Y | X2 ) + IQ*(X2;Y|X1 ) - IQ*(XrX2;Y)  (4.5.23) 


Using  Markov’s  inequality  we  obtain 


n A n ( , , n>2  n(D  + t,) 

Pb  = Pr  { lSyl  " 2 2 1 


U say 


(4.5.24) 


(4.5.26) 


Since  Pr  |e^|<  e>  there  must  exist  at  least  one  code  with  an  average 

probability  of  failure  (taken  over  K,L)  in  stage  1 less  than  e . 

Assuming  the  first  stage  is  successful,  in  stage  2 we  have  to  send 

at  most  n(D  + ) bits  to  the  receiver  in  order  to  completely  specify 

the  correct  code  word.  In  stage  2,  total  cooperation  between  the  two 

transmitters  allows  transmission  up  to  a rate  of  max  I(X,  ,X?;Y)  with  ar- 

P ( *i »*2 ) 


bitrarily  small  probability  of  error,  say  P 


Thus  the  probability 


of  failure  < Pr 


K1 


+ P„  < 2e  , which  means  that  the  scheme 


allows  reliable  transmission.  A straight  forward  calculation  shows 
that  the  overall  effective  sum  rate  using  this  scheme  is 


Rl  + R2  - 


! Iq*(X1;Y|X2)  + IQ*(X2;Y| X] ) max  I(X] ,X2;Y) 

P ( *i ) 

max  I(X, ,X2;Y)  + I0*(X,;Y|X2)  + IQ*(X2;Y] X] ) - I0*(X,,X2;Y) 
p(x-|,x2)  M 

(4.5.27) 


This  completes  the  proof  of  theorem  4.1. 
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4 . 6 CONCLUD ING  REMARKS 

The  capacity  region  of  multiple-access  channels  with  feedback  can 
generally  be  increased  by  the  scheme  of  transmitting  first  at  high  rates 
until  both  transmitters  know  each  other's  messages,  then  cooperatively 
at  a lower  rate  to  resolve  the  remaining  receiver  ambiguity.  An  open 
problem  is  to  determine  the  full  capacity  region  with  feedback  and  con- 
ditions, if  any,  under  which  the  above  scheme  is  optimal.  Finally  we 
might  mention  that  feedback  can  increase  the  capacity  region  of  inter- 
ference channels  which  are  channels  with  an  equal  number  of  senders  and 
receivers,  and  in  which  each  sender  wishes  to  communicate  with  a corre- 
sponding receiver. 


CHAPTER  5 


CONCLUSIONS 


In  chapter  2,  we  introduced  the  additive  white  Gaussian  noise  wire- 
tap channel  and  explicitly  determined  the  set  5!*  of  all  achievable 
rate-equivocation  pairs.  It  turns  out  that  the  secrecy  capacity  C$ 
is  the  difference  between  the  capacities  of  the  main  and  wiretapper's 
channels,  and  increases  monotonical ly  as  the  input  power  constraint  is 
relaxed.  The  quantity  C^  almost  completely  specifies  51*  . Some 
useful  characterizations  of  a special  class  of  wiretap  channels  are 
mentioned  in  section  2.4. 

Chapter  3 dealt  with  the  wiretao  channel  with  feedback.  It  is 
shown  that  with  the  introduction  of  feedback,  even  when  the  main  channel 
is  inferior  to  the  wiretapper's  channel,  it  is  still  possible  to  send 
reliably  at  a positive  rate  to  the  legitimate  receiver  while  at  the  same 
time  keeping  the  wiretapper  in  total  ignorance.  The  binary  erasure  wiretap 
channel  with  feedback  is  studied  in  detail  and  inner  and  outer  bounds  on 
the  achievable  (R,d)  region  are  given. 

In  Chapter  4 we  analyzed  a scheme  for  enlarging  the  capacity  region 
of  multiple-access  channels  using  feedback.  It  is  shown  that  the  capacity 
region  can  generally  be  increased  by  the  scheme  of  transmitting  first  at 
high  rates  until  both  receivers  know  each  other's  messages,  then  co- 
operatively at  a lower  rate  to  resolve  the  remaining  receiver  ambiguity. 

Among  the  problems  which  deserve  further  study,  we  may  mention  the 
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The  formulation  of  a good  coding  scheme  for  use  on  wiretap 
channels  with  feedback. 

The  proof  of  a tight  outerbound  on  the  achievable  (R,d) 
region  for  wiretap  channels  with  feedback. 

The  determination  of  the  capacity  region  of  multiple-access 
channels  with  feedback. 

The  effect  of  noisy  feedback  in  the  problems  examined  in  the 
preceeding  chapters. 

The  extension  of  the  notions  of  privacy,  security  and  secrecy 
to  networks  involving  several  senders  and  receivers.  This 
would  lead  to  a generalization  of  the  wiretap  channel. 


APPENDIX  I 


In  this  appendix,  we  show  that  for  the  example  in  which  the  main 
channel  and  the  wiretap  channel  are  binary  symmetric  channels  with 
crossover  probabil ities  0 and  e respectively,  r(R)  as  defined  in 
(2.1.5)  is  equal  to  h(e).  See  Wyner  [19]. 


Proof: 

I ( X ; Y | Z ) = H ( X j Z ) - H ( X | Y , Z ) (A. 1.1) 

= H ( X j Z ) - H ( X | Y ) (A. 1.2) 

(A. 1.2)  holds  because  X and  Z are  independent  conditioned  on  Y . 
Since  the  main  channel  is  noiseless,  we  have  that  H(X)Y)  = 0.  So 


I ( X ; Y | Z ) = H(X)  - H(Z)  + H(Z|X) 

H(Z  | X)  = £ p(x,z)  log  -rV  -t 

x,Ze{0,l } PU|X; 

= e log  ]■  + (1-c)  log 

A 

= h(e) 


(A. 1.3) 
(A. 1.4) 

(A. 1.5) 
(A. 1.6) 


Therefore,  in  order  to  maximize  I(X;Y|Z),  we  need  to  maximize 
H(X)  - H(Z). 

We  now  recall  a result  which  will  be  useful  in  lower-bounding 
H(Z)  - H(X) . 
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'r  ~ t*. 


Leinnia  A.  1 . 1 

Let  X and  Z denote  the  input  and  output  respectively  of  a 
binary  symmetric  channel.  Then 


H(Z ) > H ( X ) 


Proof:  Follows  from  Mrs.  Gerber's  Lemma  [22]. 


(A. 1.7) 


At  Qg  = ^ , H(Z)  = H(X)  . Therefore  the  maximum  value  of 
I(X;Y'Z)  is  h(t.)  and  is  attained  when  Pr { X=0 > = Pr { X=1  } = ^ . But 
this  input  distribution  also  achieves  the  capacity  of  the  main  channel 
Thus, 


r(R)  = h(t ) , 0 < R < 1 . 


(A. 1.8) 


APPENDIX  II 


We  give  examples  of  two  frequently  encountered  constant  r(R) 
wiretap  channels. 


Example  1 

Referring  to  figure  2.1,  let  the  main  channel  be  a noiseless 
binary  channel  and  the  wiretap  channel  be  a binary  erasure 
channel  (BEC)  with  erasure  rate  £ . 

Following  (A. 1.3)  we  can  write 


I ( X ; Y | Z ) = H ( X ) - H(Z)  + H(Z|X)  (A. 2.1) 

Let  Pr { X=0 } = <1q  and  Pr{X=l>  = 1 - q q . A straightforward 
calculation  yields 

H ( Z | X ) = h(e)  (A. 2. 2) 

Therefore,  to  maximize  I ( X ; Y | Z ) it  is  sufficient  to  maximize 
H ( X ) - H(Z)  . 

H(X)  - H(Z)  = - qQlog  qQ  - (l-qQ)  log  (l-qQ)  + qQ  (1-e)  log  qQ  (1-e) 

+ £ log  £ + (l-q0)  (1-e)  log  (l-q0)  (1-e)  (A. 2. 3) 

Differentiating  (A. 2. 3)  with  respect  to  qg  and  carrying  out  the 
algebra  yields 

d (H(X)  -H(Z»  , e,  (fl  2 4) 

dq0  ' 10  ' 
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Setting  (A. 2. 4)  equal  to  zero,  we  find  that  qg=  ^ corresponds  to 
a stationary  point.  Examination  of  the  second  derivative  shows  that 
the  point  q^  = ^ is  a maximum.  Thus, 

I ( X ; Y | Z ) = 1 - [ 1-e  + h(t)  ] + h(e) 

= E 

Since  q q = ^ is  also  the  capacity  achieving  distribution  of  the  main 
channel,  we  conclude  that 

r(R)  = e , 0 ^ R < CM  = 1 . (A. 2. 5) 

Figure  A.l  shows  the  complete  achievable  (R,d)  region  for  the  above 
example. 


Figure  A.l  - (R,d)  REGION  FOR  BE  WIRETAP  CHANNEL 
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Example  2 (see  figure  2.1) 

We  consider  the  case  in  which  the  main  and  wiretap  channels  are 
binary  symmetric  channels  (BSC)  with  crossover  probabilities  ey  and 
e2  respectively. 

From  (A. 1 .2)  we  have 


I ( X ; Y | Z ) = H ( X | Z ) - H ( X | Y ) 


(A. 2. 6) 


= [H( X)  + H(Z  | X)  - H(Z) ] - [H( X)  + H(Y)X)  - H(Y)  ] (A. 2. 7) 


= H ( Z | X ) - H ( Y | X ) - IH(Z)  - H( Y) ] 


(A  2.8) 


Definition  A.l:  For  0 < a,  b < 1,  define 


a * b = a (1-b)  + b (1-a) 


(A. 2. 9) 


Then  it  can  be  shown  that 


H(Y|X)  = h(ex ) 


(A. 2. 10a) 


H(Z|X)  = h (e!  * e2) 


(A. 2. 10b) 


Therefore,  the  problem  of  maximizing  I ( X ; Y | Z ) is  equivalent  to  that 


of  minimizing  H(Z)  - H(Y)  . From  lemma  A. 1.1  we  know  that 


H(Z)  > H(Y)  . 


(A. 2. 11) 


Also  the  input  distribution  Pr { X=0}  = Pr{X=l)  = ^ achieves  H(Z)  = H(Y). 
But  this  distribution  achieves  the  capacity  of  the  main  channel  as  well. 
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